David Robinson presented a fanstitic analysis of President Trump's tweets the Variance Explained blog: http://varianceexplained.org/r/trump-followup/ .
He presented an intersting scatter plot relating frequency of word use among the president's tweets before and after his election. Due to ggplot2's limitations, the scatter plot was a bit hard to read. Luckily, Python's Scattertext provides and easy way to make legible, interative scatter plots for text visualiztion. See how the same tweets were made into a Scattertext scatter plot below using Python.
Please check out Scattertext on Github at https://github.com/JasonKessler/scattertext for documentation, and see the PyData Seattle talk introducing it's usage at https://www.youtube.com/watch?v=H7X9CA2pWKo .
If you are academically inclined, you can cite the accompanying technical article as
Jason S. Kessler. Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ. ACL System Demonstrations. Vancouver, BC. 2017. https://arxiv.org/abs/1703.00565
%matplotlib inline
import scattertext as st
import re, io, itertools
from pprint import pprint
import pandas as pd
import numpy as np
import spacy.en
import os, pkgutil, json, urllib, datetime
from urllib.request import urlopen
from IPython.display import IFrame
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:98% !important; }</style>"))
df = pd.concat([pd.read_json('http://www.trumptwitterarchive.com/data/realdonaldtrump/%s.json' % (year))
for year in range(2009, 2018)])
nlp = spacy.en.English()
df['parsed'] = df.text.apply(nlp)
df['before_or_after_election'] = df['created_at'].apply(lambda x: 'after'
if x > datetime.datetime(2016,11,9)
else 'before')
df_android_non_retweets = df[(df.is_retweet == False)
& (df.source == 'Twitter for Android')
& df.text.apply(lambda x: 'RT ' not in x and 'RT:' not in x)]
df_android_non_retweets['before_or_after_election'].value_counts()
before 13989 after 435 Name: before_or_after_election, dtype: int64
corpus = st.CorpusFromParsedDocuments(df_android_non_retweets,
category_col='before_or_after_election',
parsed_col='parsed').build()
html = st.produce_scattertext_explorer(corpus,
category='after',
category_name='After Election',
not_category_name='Before Election',
use_full_doc=True,
minimum_term_frequency=2,
pmi_filter_thresold=10,
minimum_not_category_term_frequency=10,
width_in_pixels=1000,
metadata=df_android_non_retweets['created_at'])
file_name = 'trump_before_after_election.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)