Analysis of Pronoun Usage In Presidential News Conferences¶

This notebook analyzes how presidents have used first person vs. second person pronouns during their official news conferences.

The analysis relies on this parsing library: https://github.com/BuzzFeedNews/whtranscripts

In [1]:

import pandas as pd
import sys
import whtranscripts
import re

press_conference_data is a directory that includes a full set of press conferences pulled down using whtranscripts.download

In [2]:

conferences = whtranscripts.Conference.from_dir("../president_speech_notebooks/press_conference_data")

In [3]:

all_passages = [x for b in conferences for x in b.passages]

In [4]:

passages = pd.DataFrame(all_passages, columns=["passage"])

In [5]:

passages["date"] = passages["passage"].apply(lambda x: x.transcript.date)
passages["speaker"] = passages["passage"].apply(lambda x: x.speaker)
passages["text"] = passages["passage"].apply(lambda x: x.text)
passages["president"] = passages["passage"].apply(lambda x: x.transcript.president)
passages["tokens"] = passages["passage"].apply(lambda x: x.tokens)

Presidents Only DataFrame¶

This eliminates everything but the words spoken by the President. Unfortunately there are a lot of special cases so we have to do some fancy filtering in the is_president function.

In [6]:

def is_president(row):
    if row["speaker"] and "The President" in row["speaker"]\
        and "Secretary" not in row["speaker"]:
        return True
    elif row["speaker"] == "The. President" or row["speaker"] == "Mr. President":
        return True
    elif row["speaker"] and row["president"].split()[-1] in row["speaker"]\
        and "Mrs." not in row["speaker"] and "Governor" not in row["speaker"]:
        return True
    else:
        return False

In [7]:

passages["is_president"] = passages.apply(lambda x: is_president(x), axis=1)

In [8]:

president_passages = passages[passages["is_president"]]

First Person References¶

First person singular references

In [9]:

president_passages["i"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("i"))
president_passages["me"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("me"))
president_passages["my"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("my"))
president_passages["mine"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("mine"))
president_passages["myself"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("myself"))

In [10]:

president_passages["first_person_singular"] = president_passages.apply(lambda x: x["i"] + x["me"] + x["my"] +\
                                                                       x["mine"] + x["myself"], axis=1)

First person plural references

In [11]:

president_passages["we"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("we"))
president_passages["our"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("our"))
president_passages["ours"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("ours"))
president_passages["ourselves"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("ourselves"))
president_passages["us"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("us"))

In [12]:

president_passages["first_person_plural"] = president_passages.apply(lambda x: x["we"] + x["our"] + x["ours"] + x["ourselves"] + x["us"], axis=1)

In [13]:

president_passages["first_person"] = president_passages.apply(lambda x: x["first_person_singular"] + 
                                                                x["first_person_singular"], axis=1)

Second Person References¶

In [14]:

president_passages["you"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("you"))
president_passages["your"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("your"))
president_passages["yours"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("yours"))
president_passages["yourself"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("yourself"))

In [15]:

president_passages["second_person"] = president_passages.apply(lambda x: x["you"] + x["your"] + + x["yours"] + x["yourself"], axis=1)

Third Person References¶

In [16]:

president_passages["they"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("they"))
president_passages["their"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("their"))
president_passages["theirs"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("theirs"))
president_passages["themselves"] = president_passages["passage"].apply(lambda x: x.count_token_occurrences("themselves"))

In [17]:

president_passages["third_person"] = president_passages.apply(lambda x: x["they"] + x["their"] + x["theirs"] + x["themselves"], axis=1)

In [18]:

president_passages["word_count"] = president_passages["passage"].apply(lambda x: x.get_word_count())

In [19]:

president_analysis = president_passages[["word_count", "tokens", "date", "speaker", "president",
                                         "passage", "first_person", "first_person_singular",
                                         "first_person_plural", "second_person", "third_person"]]

In [20]:

presidents = pd.DataFrame(president_analysis.groupby("president").sum())

Analysis of the Presidential Pronoun Dataframe¶

In [21]:

round(100.0 * presidents["first_person_singular"].sum() / presidents["word_count"].sum())

Out[21]:

3.0

In [22]:

presidents["pct_first"] = presidents.apply(lambda x: round(100.0 * x["first_person"] / x["word_count"], 2), axis=1)

In [23]:

presidents["pct_first_singular"] = presidents.apply(lambda x: round(100.0 * x["first_person_singular"] / x["word_count"], 2), axis=1)

In [24]:

presidents["pct_first_plural"] = presidents.apply(lambda x: round(100.0 * x["first_person_plural"] / x["word_count"], 2), axis=1)

In [25]:

presidents[["pct_first_singular", "pct_first_plural", "pct_first", "word_count"]].sort("pct_first_singular", ascending=False)

Out[25]:

	pct_first_singular	pct_first_plural	pct_first	word_count
president
Harry S. Truman	4.85	0.90	9.70	366974
George Bush	4.65	2.38	9.30	400470
Dwight D. Eisenhower	4.55	1.75	9.10	564562
Gerald R. Ford	4.18	2.13	8.36	126528
Jimmy Carter	3.50	2.38	6.99	220024
William J. Clinton	3.40	3.03	6.79	637070
Lyndon B. Johnson	3.11	3.04	6.22	404830
Richard Nixon	3.09	2.32	6.19	171527
George W. Bush	3.03	2.87	6.07	615121
Ronald Reagan	3.00	3.10	6.00	173451
John F. Kennedy	2.80	3.09	5.60	245266
Barack Obama	2.45	3.61	4.90	473680
Franklin D. Roosevelt	2.14	1.30	4.29	314211
Herbert Hoover	1.94	1.13	3.88	133108

In [26]:

presidents[["pct_first_singular", "pct_first_plural", "pct_first", "word_count"]].sort("pct_first_plural", ascending=False)

Out[26]:

	pct_first_singular	pct_first_plural	pct_first	word_count
president
Barack Obama	2.45	3.61	4.90	473680
Ronald Reagan	3.00	3.10	6.00	173451
John F. Kennedy	2.80	3.09	5.60	245266
Lyndon B. Johnson	3.11	3.04	6.22	404830
William J. Clinton	3.40	3.03	6.79	637070
George W. Bush	3.03	2.87	6.07	615121
George Bush	4.65	2.38	9.30	400470
Jimmy Carter	3.50	2.38	6.99	220024
Richard Nixon	3.09	2.32	6.19	171527
Gerald R. Ford	4.18	2.13	8.36	126528
Dwight D. Eisenhower	4.55	1.75	9.10	564562
Franklin D. Roosevelt	2.14	1.30	4.29	314211
Herbert Hoover	1.94	1.13	3.88	133108
Harry S. Truman	4.85	0.90	9.70	366974

In [27]:

presidents[["pct_first_singular", "pct_first_plural", "pct_first", "word_count"]].sort("pct_first", ascending=False)

Out[27]:

	pct_first_singular	pct_first_plural	pct_first	word_count
president
Harry S. Truman	4.85	0.90	9.70	366974
George Bush	4.65	2.38	9.30	400470
Dwight D. Eisenhower	4.55	1.75	9.10	564562
Gerald R. Ford	4.18	2.13	8.36	126528
Jimmy Carter	3.50	2.38	6.99	220024
William J. Clinton	3.40	3.03	6.79	637070
Lyndon B. Johnson	3.11	3.04	6.22	404830
Richard Nixon	3.09	2.32	6.19	171527
George W. Bush	3.03	2.87	6.07	615121
Ronald Reagan	3.00	3.10	6.00	173451
John F. Kennedy	2.80	3.09	5.60	245266
Barack Obama	2.45	3.61	4.90	473680
Franklin D. Roosevelt	2.14	1.30	4.29	314211
Herbert Hoover	1.94	1.13	3.88	133108

Singular vs. Plural Pronouns Over Time¶

In [28]:

%matplotlib inline
import mplstyle, mplstyle.styles.simple
mplstyle.set(mplstyle.styles.simple)
mplstyle.set({ 
    "figure.figsize": (10, 6),
    "axes": {
        "color_cycle": [ "teal", "red" ],
    },
    "lines": {
        "linewidth": 2
    }
})

In [29]:

import datetime
president_analysis["datetime"] = president_analysis["date"].apply(lambda x: datetime.datetime(x.year, x.month, x.day))

In [30]:

def get_term_freq(df, term, resampler="AS"):
    _ = df.set_index("datetime")
    total_words = _["word_count"].resample(resampler, how="sum")
    freq_count = _[term].resample(resampler, how="sum")
    return (100.0 * freq_count / total_words)

In [31]:

ax = get_term_freq(president_analysis, "first_person_singular").plot(kind="line", label="singular", color="r")
get_term_freq(president_analysis, "first_person_plural").plot(kind="line", label="plural", color="b")
ax.legend(bbox_to_anchor=(0.2, 1))
pass

In [32]:

terms_df = pd.DataFrame(get_term_freq(president_analysis, "first_person_singular"))
terms_df.columns = ["singular"]
terms_df["plural"] = get_term_freq(president_analysis, "first_person_plural")

In [33]:

terms_df["date"] = terms_df.index
terms_df["year"] = terms_df["date"].apply(lambda x: int(x.year))
terms_df.set_index("year")
terms_df[["singular", "plural"]].to_csv("singularVsPlural2.csv")