Women's March and Tea Party, by the Numbers

The Tea Party protests that took the country by storm in 2009 had an outsized impact on the legislative process. The recent Women's March and associated movement could potentially have a similar effect, so I was curious to see how the two compared in size and location. Below, I look at the distribution and size of the marches, compare turnout by city, then look at the portion of each state that attended protests.

Overall, there were ten times more Women's Marchers (4,157,678) than Tea Party marchers (310,960). Interestingly, both protests had a similar median number of marchers (322 vs 450), although the mean was substantially higher for the Women's March (6673 vs. 903). Finally, almost every state had a larger percentage of the population turnout for the Women's March, with Colorado leading the way at 2.9%. This means that although the march was more concentrated in cities, it was still a grassroots event distributed geographically throughout the 50 states.

If the energy Women's March can be harnessed, it could have an even larger impact than the Tea Party. We may already be seeing the results in congress and town halls.

If you're viewing this notebook on Github, view it in NBViewer here instead to see the interactive plots and tables.

In [97]:
%matplotlib inline

import pandas as pd 
import numpy as np

import matplotlib.pyplot as plt
import matplotlib

import statsmodels.formula.api as smf
import statsmodels.api as sm

import json
from IPython.display import HTML


Import the Data

Jeremy Pressman, Erica Chenoweth and others recently finished compiling all the Women's March data and 538 compiled data on the Tea Party protests a few years ago, so I'll be using both those sources. I got the state level population data from the US Census, and the voter turnout data from David Wasserman. All these sources are available in a zipped file here.

In [122]:
#Read in Tea Party data
tea_df = pd.read_csv('data/tea_party.csv', sep='\t', encoding='utf-8', index_col=False)
tea_df.rename(columns={'number': 'tea_num'}, inplace=True)

#Sum any cities with two reported protests
tea_df = tea_df.groupby(by=['city', 'state'], as_index=False).sum()
In [123]:
#Read in Women's March data.  
march_df = pd.read_csv('data/womens_march.csv', encoding='utf-8', index_col=False)
march_df['Location'] = march_df['Location'].str.split(',', expand=True)[0]
march_df.replace({',': ''}, regex=True, inplace=True)
march_df = march_df.apply(pd.to_numeric, errors='ignore')

march_df = march_df.loc[:, ['Location', 'State/Territory', 'Best Guess']]
march_df.rename(columns={'Location':'city', 'State/Territory': 'state', 'Best Guess':'march_num'}, inplace=True)

#Sum any cities with two protests
march_df = march_df.groupby(by=['city', 'state'], as_index=False).sum()
march_df['city'] = march_df['city'].replace({'Washington DC': 'Washington'})
In [124]:
# Import and parse the state population data
states = {'Mississippi': 'MS', 'Northern Mariana Islands': 'MP', 'Oklahoma': 'OK', 'Wyoming': 'WY', 
          'Minnesota': 'MN', 'Alaska': 'AK', 'American Samoa': 'AS', 'Arkansas': 'AR', 'New Mexico': 'NM', 
          'Indiana': 'IN', 'Maryland': 'MD', 'Louisiana': 'LA', 'Texas': 'TX', 'Tennessee': 'TN', 
          'Iowa': 'IA', 'Wisconsin': 'WI', 'Arizona': 'AZ', 'Michigan': 'MI', 'Kansas': 'KS', 
          'Utah': 'UT', 'Virginia': 'VA', 'Oregon': 'OR', 'Connecticut': 'CT', 
          'District of Columbia': 'DC', 'New Hampshire': 'NH', 'Idaho': 'ID', 'West Virginia': 'WV', 
          'South Carolina': 'SC', 'California': 'CA', 'Massachusetts': 'MA', 'Vermont': 'VT', 
          'Georgia': 'GA', 'North Dakota': 'ND', 'Pennsylvania': 'PA', 'Puerto Rico': 'PR', 
          'Florida': 'FL', 'Hawaii': 'HI', 'Kentucky': 'KY', 'Rhode Island': 'RI', 'Nebraska': 'NE', 
          'Missouri': 'MO', 'Ohio': 'OH', 'Alabama': 'AL', 'Illinois': 'IL', 'Virgin Islands': 'VI', 
          'South Dakota': 'SD', 'Colorado': 'CO', 'New Jersey': 'NJ', 'National': 'NA', 'Washington': 'WA', 
          'North Carolina': 'NC', 'Maine': 'ME', 'New York': 'NY', 'Montana': 'MT', 'Nevada': 'NV', 
          'Delaware': 'DE', 'Guam': 'GU'}

def convert_state(element):
    if element in states.keys():
        return states[element]
        return np.nan

pop_df = pd.read_csv('data/state_population.csv', encoding='utf-8', index_col=False)
# 53, Includes u'Puerto Rico' and u'District of Columbia', u'United States'
pop_df['state'] = pop_df['geography'].apply(convert_state)
pop_df = pop_df[['state', 'pop2016']]
In [125]:
#Import and parse the voting data, by state
vote_df =  pd.read_csv('data/national_vote.csv', encoding='utf-8', index_col=False)
vote_df.replace({'%': '', '\*':'', ',':'', }, regex=True, inplace=True)
vote_df = vote_df.apply(pd.to_numeric, errors='ignore')

vote_df['state'] = vote_df['State'].apply(convert_state)

vote_df = vote_df[['state', "Dem '16 Margin"]]
vote_df.rename(columns={"Dem '16 Margin":'margin2016'}, inplace=True)
#print len(list(vote_df['state'])) #51 Includes DC

Marchers by City

First, I look at this data by city. The boxplot shows that the median march size was actually very similar between cities (322 vs 450). The mean, however, was an order of magnitude higher for the Women's March (6673), and there are more outliers at the high end of the march size. There were also ten times more Women's Marchers (4,157,678) than Tea Party marchers (310,960).

In [126]:
#Merge dataframes on city, state
city_df = march_df.merge(tea_df, how='outer', on=['city', 'state'])

#Copy for distributions
unfcity_df = city_df.copy()

# Fill 0, assume cities without data had no marchers.
# Note, it's possible the 538 data is less complete than Women's March.   
city_df.fillna(value=0, inplace=True)

fig, ax = plt.subplots() 
ax.set_ylim(1, 1e6)
unfcity_df.plot.box(figsize=(10,7), ax=ax, meanline=True, showmeans=True, color='gray', sym='k.')
plt.ylabel("Protesters (log)")

#Print total marchers
print("Total Women's March: " + '{:,.0f}'.format(city_df['march_num'].sum()))
print("Total Tea Party: " + '{:,.0f}'.format(city_df['tea_num'].sum()))

Total Women's March: 4,157,678
Total Tea Party: 310,960
march_num tea_num
count 623.000000 344.000000
mean 6673.641091 903.953488
std 42193.461677 1308.635137
min 1.000000 12.000000
25% 80.750000 200.000000
50% 322.500000 450.000000
75% 1725.000000 1000.000000
max 725000.000000 15000.000000

Cities Compared

Below is an interactive scatter plot of the number of protesters in each city for each movement. This is created using an outer join, so the assumption is that any city not shared between the two lists had no marchers.

Any city above the 45 degree line had more Tea Party Marchers, and those below had more Women's Marchers. These are log axes, so the cities do skew substantially towards the Women's march (especially the large ones).

In [127]:
extrajs = '''
    .attr("x1", x(1))
    .attr("y1", y(1))
    .attr("x2", x(1e8))                         
    .attr("y2", y(1e8))
    .attr("stroke-width", 1.25)
    .attr("stroke", "#888") //#999 #fff
    .attr("opacity", "0.6")
    //.attr("fill", "none")
    //.style("stroke-dasharray", ("10, 10"))
    .attr("class", "trendline")
    .attr("clip-path", "url(#clip)");

tooltip = '''
    "<b>Location:</b> " + d[keys.city] + ", " + d[keys.state] +
    "<br><b>Women's March: </b>" + fmtTh(+d[keys.march_num]) +
    "<br><b>Tea Party: </b>" + fmtTh(+d[keys.tea_num]) 

settings = {"x_label": "Women's March (log)", 
            "y_label": "Tea Party (log)",
            "x": 'march_num' ,
            "y": 'tea_num', 
            "tooltip": tooltip,
            "extrajs": extrajs}

interactive_log_scatter(city_df, settings=settings)
In [128]:
interactive_table(city_df.sort_values(by='march_num',ascending=False), width=400, height=500)

Binned and Counted

This makes it clear that the majority of both protests took place in groups of 200,000 or less, and that the Women's March dwarfed the Tea Party marches.

In [129]:
# Bin by march size and sum:

fig, ax = plt.subplots()
set_bins = [-1, 2e5, 4e5, 6e5, 8e5]
groups = city_df.groupby(pd.cut(city_df['march_num'], set_bins))
groups_df = groups.sum()
groups_df.plot.bar(figsize=(10,7), color=['steelblue', 'red'], alpha=0.6, ax=ax)  
plt.ylabel("Cumulative Marchers")
plt.xlabel("March Size")
ax.set_xticklabels(['200k','400k','600k', '800k'])

Marchers by State

Next, I look at the marchers grouped by state. Every state except West Virginia had a larger percentage participate in the Women's March, with Colorado leading with 2.9% of their population. California had the largest total number of protesters, at 910,830.

In [130]:
# Group city_df by state, sum
state_df = city_df.groupby(by='state', as_index=False).sum()

# Merge with vote and population dataframes:
state_df = state_df.merge(vote_df, how='inner')
state_df = state_df.merge(pop_df, how='inner')
state_df['tea_pct'] = (state_df['tea_num'] / state_df['pop2016']) * 100
state_df['march_pct'] = (state_df['march_num'] / state_df['pop2016']) * 100

# Leave DC out, marchers exceed population
state_df = state_df[state_df['state'] != 'DC']
state_df = state_df.sort_values(by='march_pct', ascending=False).reset_index(drop=True)

interactive_table(state_df, width=600, height=500)

How do state turnouts compare?

In [131]:
fig, ax = plt.subplots(figsize=(10,8))  

plt.scatter(x=state_df['march_pct'], y=state_df['tea_pct'], 
            marker='', alpha=0.7, color="steelblue", label='_nolegend_') #marker='o'

A = state_df['march_pct']
B = state_df['tea_pct']
C = state_df['state']
D = range(len(C))

for a,b,c,d in zip(A, B, C, D):
    #if d % 50 == 0: #Annotate every n
    ax.annotate('%s' % c, xy=(a,b), textcoords='data') 
plt.xlabel("Women's Marchers (% Population)")
plt.ylabel("Tea Party Marchers (% Population)")

x = pd.DataFrame({'line': np.linspace(0, 3, 10)})
plt.plot(x, x, 'k--', alpha=0.7, label='Equal (1:1)')
# Average State Ratio = 0.008984/0.001222 = 7.35 times % of women's marchers
plt.plot(x, x/7, '--', color="gray", alpha=0.8, label='Average State Ratio (7:1)')




Did blue states have more marchers?

The Democratic margin is a fairly good indicator for the Women's March participation. Some states overperformed (CA, OR, MA, VT, WA, IL) or underperformed the linear regression line, although some of the underperformers are states adjacent to DC.

In [132]:
fig, ax = plt.subplots(figsize=(10,8))  #figsize=(12,10)

plt.scatter(x=state_df['margin2016'], y=state_df['march_pct'], 
            marker='', alpha=0.9, color="steelblue", label='_nolegend_')

A = state_df['margin2016']
B = state_df['march_pct']
C = state_df['state']
D = range(len(C))

for a,b,c,d in zip(A, B, C, D):
    #if d % 50 == 0: #Annotate every n
    ax.annotate('%s' % c, xy=(a,b), textcoords='data') 
plt.xlabel("2016 Democratic Margin")
plt.ylabel("Women's Marchers (% Population)")

# 1st order polynomial
poly_1 = smf.ols(formula='I(march_pct) ~ 1 + margin2016', data=state_df).fit()  #, missing='drop'
x = pd.DataFrame({'margin2016': np.linspace(-60, 40, 10)})
plt.plot(x, poly_1.predict(x), color="black", label='Poly n=1 $R^2$=%.2f' % (poly_1.rsquared), alpha=0.6)

ax.set_ylim(-0.5, 3.1)


# http://www.politico.com/story/2016/11/senate-democrats-2018-midterms-231516
# Republicans are targeting a quintet of senators from conservative states where Trump 
# walloped Hillary Clinton: Montana, Missouri, Indiana, North Dakota and West Virginia. 
#     The GOP could amass a filibuster-proof majority by running the table in those 
#     states and other battlegrounds.
# IN, MS underperformed
# MT overperformed trend line, 
# ND, WV did about as expected.  

#MD, NJ, RI, CT, VA, DE might all have underperformed because people were in DC
# DC has more than it's population in protest 1.00
In [133]:
fig, ax = plt.subplots(figsize=(10,8))  

plt.scatter(x=state_df['margin2016'], y=state_df['tea_pct'], 
            marker='', color='gray', alpha=0.9, label='_nolegend_')

A = state_df['margin2016']
B = state_df['tea_pct']
C = state_df['state']
D = range(len(C))

for a,b,c,d in zip(A, B, C, D):
    #if d % 50 == 0: #Annotate every n
    ax.annotate('%s' % c, xy=(a,b), textcoords='data') 
plt.xlabel("2016 Democratic Margin")
plt.ylabel("Tea Party Marchers (% Population)")

# 1st order polynomial
poly_1 = smf.ols(formula='I(tea_pct) ~ 1 + margin2016', data=state_df).fit()  #, missing='drop'
x = pd.DataFrame({'margin2016': np.linspace(-60, 40, 10)})
plt.plot(x, poly_1.predict(x), 'k-', label='Poly n=1 $R^2$=%.2f' % (poly_1.rsquared), alpha=0.6)



Code for Visualizations

In [89]:
def interactive_log_scatter(df, settings):
    srcdoc = r'''
    <!DOCTYPE html>
    <meta charset="utf-8">
    <title>Zoom + Pan</title>
    body {
      position: relative;
      width: 700px; /*960px*/

    svg {font: 10px sans-serif;}

    rect {fill: #e5e5e5; }

    .label {
        font-size: 12px;
        /*stroke: #ddd;*/
        fill: #555;

    .dot {
      /*stroke: #aaa;*/
      /*stroke: red;*/
      /*border: 1;*/
    .dot:hover {fill-opacity: 0.4;}

    .axis path,
    .axis line {
      stroke: #f4f4f4;  /* black;*/
      fill: none; 
      stroke-width: 1px;
      fill: none;
      stroke: #777;  
      stroke-width: 1px;
      opacity: 1;

    .buttons {
      position: absolute;
      right: 30px;
      top: 30px;
    button {
      font: 16px sans-serif;
      display: block;
      border-radius: 0px;
      width: 25px;
      /*outline: none;*/
      background-color: white;
      border: none;
    button:hover {
    /*outline-color: #ddd;
    outline: 1;*/
    outline-color: #b5b5b5;
    div.tooltip {
      position: absolute;
      padding: 5px;
      font: 12px sans-serif;
      background: white;
      border: 0px;
      border-radius: 0px;

    <div class="buttons">
      <button data-zoom="+0.5">+</button>  <!-- data-zoom="+1" -->
      <button data-zoom="-0.5">-</button>
    <script src="//d3js.org/d3.v3.min.js"></script>

    var margin = {top: 20, right: 20, bottom: 40, left: 40},
        width = 700 - margin.left - margin.right,
        height = 530 - margin.top - margin.bottom;
    var fmtTh = d3.format(",");

    var keys = ||keys||;
    var data = ||datainsert||;
    var xName = "||x||";
    var yName = "||y||";
    var xLabel = "||x_label||";
    var yLabel = "||y_label||";
    var min_x = d3.min(data, function(d) { return +d[keys[xName]]; });
    var max_x = d3.max(data, function(d) { return +d[keys[xName]]; });
    var min_y = d3.min(data, function(d) { return +d[keys[yName]]; });
    var max_y = d3.max(data, function(d) { return +d[keys[yName]]; });
    var max = d3.max([min_x, min_y, max_x, max_y].map(Math.abs));

    var x = d3.scale.log()
        .domain([0.7, 10*max_x])
        .range([0, width]);

    var y = d3.scale.log()
        .domain([0.7, 10*max_y])
        .range([height, 0]);

    var xAxis = d3.svg.axis()
        .ticks(10, ",.1s")

    var yAxis = d3.svg.axis()
        .ticks(10, ",.1s")
    var zoom = d3.behavior.zoom()
        //.scaleExtent([1, 10])
        .center([width / 2, height / 2])
        .size([width, height])
        .on("zoom", zoomed);

    var svg = d3.select("body").append("svg")
        .attr("width", width + margin.left + margin.right)
        .attr("height", height + margin.top + margin.bottom)
        .attr("transform", "translate(" + margin.left + "," + margin.top + ")")

    //Create clip, then apply it to each dot
    var clip = svg.append("defs").append("svg:clipPath")
        .attr("id", "clip")
        .attr("id", "clip-rect")
        .attr("x", "0")  //
        .attr("y", "0")  //
        .attr('width', width)
        .attr('height', height);

        .attr("width", width)
        .attr("height", height);

        .attr("class", "x axis")
        .attr("transform", "translate(0," + height + ")")
        //.attr("clip-path", "url(#clip)")

        .attr("class", "y axis")
        //.attr("clip-path", "url(#clip)")
    // Tooltips
    var div = d3.select("body")
        .attr("class", "tooltip")
        .style("opacity", 0);

      .attr("class", "dot")
      .attr("clip-path", "url(#clip)")  //add the clip to each dot
      .attr("r", 3.5) //3.5  4.5 3*zoom.scale()
      .attr("cx", function(d) { return x(+d[keys[xName]] + 1); })
      .attr("cy", function(d) { return y(+d[keys[yName]] + 1); })
      .style("fill", 'steelblue' ) //red  gray
      .attr('fill-opacity', 0.8) //0.6 0.9
      .on("mouseover", function(d) { drawTooltip(d); })
      .on("mouseout", function() {
        div.style("opacity", 0);

        .attr("class", "x label")
        .attr("text-anchor", "middle")
        .attr("x", width/2)
        .attr("y", height + 30)

        .attr("class", "y label")
        .attr("text-anchor", "middle")
        .attr("x", -height/2)
        .attr("y", -30) //-30
        .attr("transform", "rotate(-90)")

        .on("click", clicked);

    function zoomed() {

        //.attr("r", 3*zoom.scale())
        .attr("cx", function (d) {
            return x(+d[keys[xName]] + 1);
        .attr("cy", function (d) {
            return y(+d[keys[yName]] + 1);
            .attr("x1", x(1))
            .attr("y1", y(1))
            .attr("x2", x(1e8))                         
            .attr("y2", y(1e8))   

    function clicked() {
      svg.call(zoom.event); // https://github.com/mbostock/d3/issues/2387

      // Record the coordinates (in data space) of the center (in screen space).
      var center0 = zoom.center(), translate0 = zoom.translate(), coordinates0 = coordinates(center0);
      zoom.scale(zoom.scale() * Math.pow(2, +this.getAttribute("data-zoom")));

      // Translate back to the center.
      var center1 = point(coordinates0);
      zoom.translate([translate0[0] + center0[0] - center1[0], translate0[1] + center0[1] - center1[1]]);



    function coordinates(point) {
      var scale = zoom.scale(), translate = zoom.translate();
      return [(point[0] - translate[0]) / scale, (point[1] - translate[1]) / scale];

    function point(coordinates) {
      var scale = zoom.scale(), translate = zoom.translate();
      return [coordinates[0] * scale + translate[0], coordinates[1] * scale + translate[1]];

    function drawTooltip(d) {
        div.style("opacity", 1.0);
            .style("left", (d3.event.pageX) + "px")
            .style("top", (d3.event.pageY ) + "px");



    srcdoc = srcdoc.replace('||datainsert||', df.to_json(orient="values"))
    key_list = list(df)
    key_dict = {i: key_list.index(i) for i in key_list}
    srcdoc = srcdoc.replace('||keys||', json.dumps(key_dict) )
    for s in settings.keys():
        srcdoc = srcdoc.replace('||{0}||'.format(s), str(settings[s]))
    srcdoc = srcdoc.replace('"', '&quot;')

    embed = HTML('<iframe srcdoc="{0}" '
                 'style="width: {1}px; height: {2}px; display:block; width: 100%; margin: 25px auto; '
                 'border: none"></iframe>'.format(srcdoc, width, height))
    return embed
In [2]:
def interactive_table(df, width, height):
    srcdoc = r'''
    <!DOCTYPE html>
    <meta charset="utf-8">

    body {
        width: 800px;

    table {
        font-size: 12px;
        border-collapse: collapse;
        border-top: 1px solid #ddd;
        border-right: 1px solid #ddd;

    th {
        padding: 10px;
        cursor: pointer;
        background-color: #f2f2f2;

    th, td {
        text-align: left;
        border-bottom: 1px solid #ddd;
        border-left: 1px solid #ddd;

    td {
        padding: 5px 8px;

    tr:nth-child(even) {
      background-color: #f9f9f9;

    tr:hover {
      background-color: #F0F8FF; /*#f9f9f9;*/



    <div id ="tableInsert"></div>


    function sortTable(table, col, reverse) {
        var tb = table.tBodies[0], 
            tr = Array.prototype.slice.call(tb.rows, 0), // put rows into array

        reverse = -((+reverse) || -1);
        tr = tr.sort(function (a, b) { 
            var first = a.cells[col].textContent.trim();
            var second = b.cells[col].textContent.trim();

            if (isNumeric(first) && isNumeric(second)) {        
                return reverse * (Number(first) - Number(second));
            } else {
                return reverse * first.localeCompare(second);
        for(i = 0; i < tr.length; ++i) {  // append each row in order

    function isNumeric(n) {
      return !isNaN(parseFloat(n)) && isFinite(n);

    function makeSortable(table) {
        var th = table.tHead, i;
        th && (th = th.rows[0]) && (th = th.cells);
        if (th) i = th.length;
        else return; // if no `<thead>` then do nothing
        while (--i >= 0) (function (i) {
            var dir = 1;
            th[i].addEventListener('click', function () {sortTable(table, i, (dir = 1 - dir))});

    function makeAllSortable(parent) {
        parent = parent || document.body;
        var t = parent.getElementsByTagName('table'), i = t.length;
        while (--i >= 0) makeSortable(t[i]);

    function addTable() {
        var tableDiv = document.getElementById("tableInsert")
        var table = document.createElement('table')
        var tableHead = document.createElement('thead')
        var tableBody = document.createElement('tbody')


        var heading = ||headinginsert||;
        var data = ||datainsert||;

        //TABLE HEAD
        var tr = document.createElement('tr');
        for (i = 0; i < heading.length; i++) {
            var th = document.createElement('th')
            //th.width = '75';

        //TABLE ROWS
        for (i = 0; i < data.length; i++) {
            var tr = document.createElement('TR');
            for (j = 0; j < data[i].length; j++) {
                var td = document.createElement('TD')


    window.onload = function () {addTable(); makeAllSortable(); };
    // use callback makeAllSortable(); at end?
    // window.onload = function () {addTable(makeAllSortable);  };  



    srcdoc = srcdoc.replace('||headinginsert||', json.dumps(list(df)))
    srcdoc = srcdoc.replace('||datainsert||', df.to_json(orient="values"))
    srcdoc = srcdoc.replace('"', '&quot;')

    html = '''<iframe srcdoc="{0}" style="width: {1}px; height: {2}px; 
            display:block; margin: 25px; border: none"></iframe>
            '''.format(srcdoc, width, height)  #width: 100%;  margin: 25px auto;

    embed = HTML(html)
    return embed