Analyzing Census Data with Pandas

Sergio Sánchez Zavala

Who am I?

My name is Sergio Sánchez and I'm a research associate at PPIC (Public Policy Institute of California) in the Higher Ed Center. The work I do there covers developmental education reform in California Community Colleges, economic mobility, and some immigration stuff.

Who am I? (part 2)

I'm very interested in data visualization. I'm a facilitator in the newly formed Data Visualization Society. My newest project is @tacosdedatos - tacosdedatos.com where I hope to build a place to learn data analysis and data visualization best practices, techniques, and knowledge in Spanish.

Housekeeping

Materials are on GitHub at https://github.com/chekos/analyzing-census-data

git clone https://github.com/chekos/analyzing-census-data
cd analyzing-census-data

You only need jupyter and pandas to follow along in your personal computer.

We will be using Jupyter Lab but you can follow along in a Jupyter Notebook if you're more comfortable that way.

MyBinder.org

We'll be using mybinder.org to go through this tutorial.

Binder allows you to create custom computing environments that can be shared and used by many remote users. It is powered by BinderHub, which is an open-source tool that deploys the Binder service in the cloud. One-such deployment lives here, at mybinder.org, and is free to use. For more information about the mybinder.org deployment and the team that runs it, see About mybinder.org.

Census Data

The US Census conducts more than 130 surveys every year. They have households surveys with data on education, health, employment, migration and many more topics.

https://www.census.gov/programs-surveys/are-you-in-a-survey/survey-list/household-survey-list.html

They also have business surveys on retail, wholesale, imports/exports, entrepeneurship, and public libraries among many, many other things.

https://www.census.gov/programs-surveys/are-you-in-a-survey/survey-list/business-survey-list.html

One of the most popular households surveys are the American Community Survey or ACS which we will be using for our analysis today.

The American Community Survey (ACS) helps local officials, community leaders, and businesses understand the changes taking place in their communities. It is the premier source for detailed population and housing information about our nation.

Where to get it?

The Census website provides a lot of ways to access their data.

AmericanFactFinder

  • American FactFinder provides access to data about the United States, Puerto Rico and the Island Areas. The data in American FactFinder come from several censuses and surveys.

Where to get it?

Pre-computed Tables

They also provide pre-computed tables for popular topics like educational attainment or median incomes at various geographic levels (region, metropolitan area, state, county, etc)

https://www.census.gov/data/tables.html

Where to get it?

IPUMS

IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services available free of charge.

IPUMS stands for Integrated Public Microdata Series ipums

How do I get it using python?

There are a few python packages on pypi.org related to Census data. Here are four notable ones:

census - pypi

A simple wrapper for the United States Census Bureau’s API. Provides access to ACS, SF1, and SF3 data sets.

from census import Census
from us import states

c = Census("MY_API_KEY")
c.acs5.get(('NAME', 'B25034_010E'),
          {'for': 'state:{}'.format(states.MD.fips)})

cenpy - pypi

An interface to explore and query the US Census API and return Pandas Dataframes. Ideally, this package is intended for exploratory data analysis and draws inspiration from sqlalchemy-like interfaces and acs.R.

The docs include an intro notebook

census-data-downloader - GitHub but also pip installable

census-data-downloader is a Command Line Interface developed by the Los Angeles Times to download Census data and reformat it for humans.

export CENSUS_API_KEY='<your API key>'
censusdatadownloader --year 2010 medianage states

censusdata - pypi

This package handles the details of interacting with the Census API for you, so that you can focus on working with the data. It provides a class for representing Census geographies. It also provides functions for gaining further information about specific variables and tables and for searching for variables.

Let's analyze some census data!