Using cenpy to analyze Segregation in US Cities

Levi John Wolf

University of Bristol

[email protected]

cenpy makes it dead simple to fetch demographic data. It works by automatically discovering different data products made available by the US Census Bureau, exposing these data products in a consistent pythonic fashion, and then wrangling the data into a clean geopandas dataframe. This is useful in the analysis of segregation in American cities. It's often difficult to do large-scale demographic analysis because demographic data at sufficiently fine-grained spatial resolution is hard to get and process over a large geographic extent. A few cities in a few states rapidly becomes a difficult analytical task. With cenpy, though, this becomes easy.

Further, the new segregation package allows for the computation and comparison of segregation measures in different urban systems. While it's often easy to compute a single measure of segregation, it's difficult to conduct inference on that measure. Thus, we can easily figure out how segregated a place is, but not figure out the intrinsic uncertainty in that estimated segregation measure. Further, comparing segregation indices between places or across time should consider this uncertainty.

Fortunately, using cenpy and segregation packages in Python, we can conduct fast analyses to examine how segregation changes over time or between cities. Below, I'll walk through an example of how you can examine changes in the segregation of Hispanic populations over time in Phoenix and comparison of segregation between Phoenix and Austin.

Importing packages

First, the packages we need are cenpy and segregation. But, to help get a sense of what the areas look like, I use contextily, a simple package to request basemap tiles to use in matplotlib plots.

In [1]:
import cenpy
import segregation
import contextily
%matplotlib inline

Cenpy has two different ways it can be used. The new product API focus on using geographical names to make querying as simple as possible. But, because it requires a little more prior knowledge about how queries should be formed, there are a limited number of data products that are supported. The 5-year ACS and 2010 Decennial census are supported. By default, the most recent 5-year ACS is fetched.

In [2]:
acs = cenpy.products.ACS()

Once the product is built, it has a few useful attributes. All of the variables, or columns in the Census's database, are contained within the dataframe variables:

In [26]:
acs.variables
Out[26]:
attributes concept group label limit predicateOnly predicateType required values
AIANHH NaN NaN N/A Geography 0 NaN NaN NaN NaN
AIHHTL NaN NaN N/A Geography 0 NaN NaN NaN NaN
AIRES NaN NaN N/A Geography 0 NaN NaN NaN NaN
ANRC NaN NaN N/A Geography 0 NaN NaN NaN NaN
B00001_001E B00001_001EA UNWEIGHTED SAMPLE COUNT OF THE POPULATION B00001 Estimate!!Total 0 NaN int NaN NaN
B00002_001E B00002_001EA UNWEIGHTED SAMPLE HOUSING UNITS B00002 Estimate!!Total 0 NaN int NaN NaN
B01001A_001E B01001A_001M,B01001A_001MA,B01001A_001EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total 0 NaN int NaN NaN
B01001A_002E B01001A_002M,B01001A_002MA,B01001A_002EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male 0 NaN int NaN NaN
B01001A_003E B01001A_003M,B01001A_003MA,B01001A_003EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!Under 5 years 0 NaN int NaN NaN
B01001A_004E B01001A_004M,B01001A_004MA,B01001A_004EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!5 to 9 years 0 NaN int NaN NaN
B01001A_005E B01001A_005M,B01001A_005MA,B01001A_005EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!10 to 14 years 0 NaN int NaN NaN
B01001A_006E B01001A_006M,B01001A_006MA,B01001A_006EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!15 to 17 years 0 NaN int NaN NaN
B01001A_007E B01001A_007M,B01001A_007MA,B01001A_007EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!18 and 19 years 0 NaN int NaN NaN
B01001A_008E B01001A_008M,B01001A_008MA,B01001A_008EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!20 to 24 years 0 NaN int NaN NaN
B01001A_009E B01001A_009M,B01001A_009MA,B01001A_009EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!25 to 29 years 0 NaN int NaN NaN
B01001A_010E B01001A_010M,B01001A_010MA,B01001A_010EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!30 to 34 years 0 NaN int NaN NaN
B01001A_011E B01001A_011M,B01001A_011MA,B01001A_011EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!35 to 44 years 0 NaN int NaN NaN
B01001A_012E B01001A_012M,B01001A_012MA,B01001A_012EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!45 to 54 years 0 NaN int NaN NaN
B01001A_013E B01001A_013M,B01001A_013MA,B01001A_013EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!55 to 64 years 0 NaN int NaN NaN
B01001A_014E B01001A_014M,B01001A_014MA,B01001A_014EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!65 to 74 years 0 NaN int NaN NaN
B01001A_015E B01001A_015M,B01001A_015MA,B01001A_015EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!75 to 84 years 0 NaN int NaN NaN
B01001A_016E B01001A_016M,B01001A_016MA,B01001A_016EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Male!!85 years and over 0 NaN int NaN NaN
B01001A_017E B01001A_017M,B01001A_017MA,B01001A_017EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female 0 NaN int NaN NaN
B01001A_018E B01001A_018M,B01001A_018MA,B01001A_018EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!Under 5 years 0 NaN int NaN NaN
B01001A_019E B01001A_019M,B01001A_019MA,B01001A_019EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!5 to 9 years 0 NaN int NaN NaN
B01001A_020E B01001A_020M,B01001A_020MA,B01001A_020EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!10 to 14 years 0 NaN int NaN NaN
B01001A_021E B01001A_021M,B01001A_021MA,B01001A_021EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!15 to 17 years 0 NaN int NaN NaN
B01001A_022E B01001A_022M,B01001A_022MA,B01001A_022EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!18 and 19 years 0 NaN int NaN NaN
B01001A_023E B01001A_023M,B01001A_023MA,B01001A_023EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!20 to 24 years 0 NaN int NaN NaN
B01001A_024E B01001A_024M,B01001A_024MA,B01001A_024EA SEX BY AGE (WHITE ALONE) B01001A Estimate!!Total!!Female!!25 to 29 years 0 NaN int NaN NaN
... ... ... ... ... ... ... ... ... ...
COUSUB NaN NaN N/A Geography 0 NaN NaN NaN NaN
CSA NaN NaN N/A Geography 0 NaN NaN NaN NaN
DIVISION NaN NaN N/A Geography 0 NaN NaN NaN NaN
GEOCOMP NaN NaN N/A Geographic Component code 0 NaN string default displayed {'item': {'R1': 'Not in an offshore area', 'S0...
GEO_ID NAME ALLOCATION OF EDUCATIONAL ATTAINMENT FOR THE P... B17015,B18104,B17016,B18105,B17017,B18106,B170... Geography 0 NaN string NaN NaN
METDIV NaN NaN N/A Geography 0 NaN NaN NaN NaN
NATION NaN NaN N/A Geography 0 NaN NaN NaN NaN
NECTA NaN NaN N/A Geography 0 NaN NaN NaN NaN
NECTADIV NaN NaN N/A Geography 0 NaN NaN NaN NaN
PLACE NaN NaN N/A Geography 0 NaN NaN NaN NaN
PLACEREM NaN NaN N/A Geography 0 NaN NaN NaN NaN
PRINCITY NaN NaN N/A Geography 0 NaN NaN NaN NaN
PUMA5 NaN NaN N/A Geography 0 NaN NaN NaN NaN
REGION NaN NaN N/A Geography 0 NaN NaN NaN NaN
SDELM NaN NaN N/A Geography 0 NaN NaN NaN NaN
SDSEC NaN NaN N/A Geography 0 NaN NaN NaN NaN
SDUNI NaN NaN N/A Geography 0 NaN NaN NaN NaN
SLDL NaN NaN N/A Geography 0 NaN NaN NaN NaN
SLDU NaN NaN N/A Geography 0 NaN NaN NaN NaN
STATE NaN NaN N/A Geography 0 NaN NaN NaN NaN
SUBMCD NaN NaN N/A Geography 0 NaN NaN NaN NaN
TRACT NaN NaN N/A Geography 0 NaN NaN NaN NaN
TRIBALBG NaN NaN N/A Geography 0 NaN NaN NaN NaN
TRIBALCT NaN NaN N/A Geography 0 NaN NaN NaN NaN
TRISUBREM NaN NaN N/A Geography 0 NaN NaN NaN NaN
UA NaN NaN N/A Geography 0 NaN NaN NaN NaN
ZCTA NaN NaN N/A Geography 0 NaN NaN NaN NaN
for NaN Census API Geography Specification N/A Census API FIPS 'for' clause 0 True fips-for NaN NaN
in NaN Census API Geography Specification N/A Census API FIPS 'in' clause 0 True fips-in NaN NaN
ucgid NaN Census API Geography Specification N/A Uniform Census Geography Identifier clause 0 True ucgid NaN NaN

25110 rows × 9 columns

That's a lot of variables! There are so many variables it's nearly impossible to understand them all at a glance. This is because most census tables are composed of a table identifier, which tells you the general topic of the variable, and then a position in that table. Fortunately, cenpy allows you to examine the tables directly, which are a little easier to understand:

In [27]:
acs.tables
Out[27]:
description columns
table_name
B00001 UNWEIGHTED SAMPLE COUNT OF THE POPULATION [B00001_001E]
B00002 UNWEIGHTED SAMPLE HOUSING UNITS [B00002_001E]
B01001 SEX BY AGE [B01001_001E, B01001_002E, B01001_003E, B01001...
B01002 MEDIAN AGE BY SEX [B01002_001E, B01002_002E, B01002_003E]
B01003 TOTAL POPULATION [B01003_001E]
B02001 RACE [B02001_001E, B02001_002E, B02001_003E, B02001...
B02008 WHITE ALONE OR IN COMBINATION WITH ONE OR MORE... [B02008_001E]
B02009 BLACK OR AFRICAN AMERICAN ALONE OR IN COMBINAT... [B02009_001E]
B02010 AMERICAN INDIAN AND ALASKA NATIVE ALONE OR IN ... [B02010_001E]
B02011 ASIAN ALONE OR IN COMBINATION WITH ONE OR MORE... [B02011_001E]
B02012 NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... [B02012_001E]
B02013 SOME OTHER RACE ALONE OR IN COMBINATION WITH O... [B02013_001E]
B02014 AMERICAN INDIAN AND ALASKA NATIVE ALONE FOR SE... [B02014_001E, B02014_002E, B02014_003E, B02014...
B02015 ASIAN ALONE BY SELECTED GROUPS [B02015_001E, B02015_002E, B02015_003E, B02015...
B02016 NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... [B02016_001E, B02016_002E, B02016_003E, B02016...
B02017 AMERICAN INDIAN AND ALASKA NATIVE (AIAN) ALONE... [B02017_001E, B02017_002E, B02017_003E, B02017...
B02018 ASIAN ALONE OR IN ANY COMBINATION BY SELECTED ... [B02018_001E, B02018_002E, B02018_003E, B02018...
B02019 NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... [B02019_001E, B02019_002E, B02019_003E, B02019...
B03001 HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN [B03001_001E, B03001_002E, B03001_003E, B03001...
B03002 HISPANIC OR LATINO ORIGIN BY RACE [B03002_001E, B03002_002E, B03002_003E, B03002...
B03003 HISPANIC OR LATINO ORIGIN [B03003_001E, B03003_002E, B03003_003E]
B04004 PEOPLE REPORTING SINGLE ANCESTRY [B04004_001E, B04004_002E, B04004_003E, B04004...
B04005 PEOPLE REPORTING MULTIPLE ANCESTRY [B04005_001E, B04005_002E, B04005_003E, B04005...
B04006 PEOPLE REPORTING ANCESTRY [B04006_001E, B04006_002E, B04006_003E, B04006...
B04007 ANCESTRY [B04007_001E, B04007_002E, B04007_003E, B04007...
B05001 NATIVITY AND CITIZENSHIP STATUS IN THE UNITED ... [B05001_001E, B05001_002E, B05001_003E, B05001...
B05002 PLACE OF BIRTH BY NATIVITY AND CITIZENSHIP STATUS [B05002_001E, B05002_002E, B05002_003E, B05002...
B05003 SEX BY AGE BY NATIVITY AND CITIZENSHIP STATUS [B05003_001E, B05003_002E, B05003_003E, B05003...
B05004 MEDIAN AGE BY NATIVITY AND CITIZENSHIP STATUS ... [B05004_001E, B05004_002E, B05004_003E, B05004...
B05005 PERIOD OF ENTRY BY NATIVITY AND CITIZENSHIP ST... [B05005_001E, B05005_002E, B05005_003E, B05005...
... ... ...
C15010 FIELD OF BACHELOR'S DEGREE FOR FIRST MAJOR FOR... [C15010_001E, C15010_002E, C15010_003E, C15010...
C16001 LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 Y... [C16001_001E, C16001_002E, C16001_003E, C16001...
C16002 HOUSEHOLD LANGUAGE BY HOUSEHOLD LIMITED ENGLIS... [C16002_001E, C16002_002E, C16002_003E, C16002...
C17002 RATIO OF INCOME TO POVERTY LEVEL IN THE PAST 1... [C17002_001E, C17002_002E, C17002_003E, C17002...
C18108 AGE BY NUMBER OF DISABILITIES [C18108_001E, C18108_002E, C18108_003E, C18108...
C18120 EMPLOYMENT STATUS BY DISABILITY STATUS [C18120_001E, C18120_002E, C18120_003E, C18120...
C18121 WORK EXPERIENCE BY DISABILITY STATUS [C18121_001E, C18121_002E, C18121_003E, C18121...
C18130 AGE BY DISABILITY STATUS BY POVERTY STATUS [C18130_001E, C18130_002E, C18130_003E, C18130...
C18131 RATIO OF INCOME TO POVERTY LEVEL IN THE PAST 1... [C18131_001E, C18131_002E, C18131_003E, C18131...
C21007 AGE BY VETERAN STATUS BY POVERTY STATUS IN THE... [C21007_001E, C21007_002E, C21007_003E, C21007...
C24010 SEX BY OCCUPATION FOR THE CIVILIAN EMPLOYED PO... [C24010_001E, C24010_002E, C24010_003E, C24010...
C24020 SEX BY OCCUPATION FOR THE FULL-TIME, YEAR-ROUN... [C24020_001E, C24020_002E, C24020_003E, C24020...
C24030 SEX BY INDUSTRY FOR THE CIVILIAN EMPLOYED POPU... [C24030_001E, C24030_002E, C24030_003E, C24030...
C24040 SEX BY INDUSTRY FOR THE FULL-TIME, YEAR-ROUND ... [C24040_001E, C24040_002E, C24040_003E, C24040...
C24050 INDUSTRY BY OCCUPATION FOR THE CIVILIAN EMPLO... [C24050_001E, C24050_002E, C24050_003E, C24050...
C24060 OCCUPATION BY CLASS OF WORKER FOR THE CIVILIAN... [C24060_001E, C24060_002E, C24060_003E, C24060...
C24070 INDUSTRY BY CLASS OF WORKER FOR THE CIVILIAN E... [C24070_001E, C24070_002E, C24070_003E, C24070...
C27004 EMPLOYER-BASED HEALTH INSURANCE BY SEX BY AGE [C27004_001E, C27004_002E, C27004_003E, C27004...
C27005 DIRECT-PURCHASE HEALTH INSURANCE BY SEX BY AGE [C27005_001E, C27005_002E, C27005_003E, C27005...
C27006 MEDICARE COVERAGE BY SEX BY AGE [C27006_001E, C27006_002E, C27006_003E, C27006...
C27007 MEDICAID/MEANS-TESTED PUBLIC COVERAGE BY SEX B... [C27007_001E, C27007_002E, C27007_003E, C27007...
C27008 TRICARE/MILITARY HEALTH COVERAGE BY SEX BY AGE [C27008_001E, C27008_002E, C27008_003E, C27008...
C27009 VA HEALTH CARE BY SEX BY AGE [C27009_001E, C27009_002E, C27009_003E, C27009...
C27012 HEALTH INSURANCE COVERAGE STATUS AND TYPE BY W... [C27012_001E, C27012_002E, C27012_003E, C27012...
C27013 PRIVATE HEALTH INSURANCE BY WORK EXPERIENCE [C27013_001E, C27013_002E, C27013_003E, C27013...
C27014 PUBLIC HEALTH INSURANCE BY WORK EXPERIENCE [C27014_001E, C27014_002E, C27014_003E, C27014...
C27016 HEALTH INSURANCE COVERAGE STATUS BY RATIO OF I... [C27016_001E, C27016_002E, C27016_003E, C27016...
C27017 PRIVATE HEALTH INSURANCE BY RATIO OF INCOME TO... [C27017_001E, C27017_002E, C27017_003E, C27017...
C27018 PUBLIC HEALTH INSURANCE BY RATIO OF INCOME TO ... [C27018_001E, C27018_002E, C27018_003E, C27018...
C27021 HEALTH INSURANCE COVERAGE STATUS BY LIVING AR... [C27021_001E, C27021_002E, C27021_003E, C27021...

665 rows × 2 columns

Still, there are way too many tables to inspect individually. And, tables only provides the main tables, not the cross-tabulations by race, sex, or age which are exposed in crosstab_tables. This problem means we need an efficient way to filter the set of tables (or variables) to focus on a specific topic. The filter_tables and filter_variables make this simple. There, you can filter based on table names or based on text that's within the description of the table/variable. For instance, to focus in on all tables that mention "race" in the ACS, you can use:

In [28]:
acs.filter_tables('RACE', by='description')
Out[28]:
description columns
table_name
B02001 RACE [B02001_001E, B02001_002E, B02001_003E, B02001...
B02008 WHITE ALONE OR IN COMBINATION WITH ONE OR MORE... [B02008_001E]
B02009 BLACK OR AFRICAN AMERICAN ALONE OR IN COMBINAT... [B02009_001E]
B02010 AMERICAN INDIAN AND ALASKA NATIVE ALONE OR IN ... [B02010_001E]
B02011 ASIAN ALONE OR IN COMBINATION WITH ONE OR MORE... [B02011_001E]
B02012 NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... [B02012_001E]
B02013 SOME OTHER RACE ALONE OR IN COMBINATION WITH O... [B02013_001E]
B03002 HISPANIC OR LATINO ORIGIN BY RACE [B03002_001E, B03002_002E, B03002_003E, B03002...
B25006 RACE OF HOUSEHOLDER [B25006_001E, B25006_002E, B25006_003E, B25006...
B98013 TOTAL POPULATION COVERAGE RATE BY WEIGHTING RA... [B98013_001E, B98013_002E, B98013_003E, B98013...
B99021 ALLOCATION OF RACE [B99021_001E, B99021_002E, B99021_003E]
C02003 DETAILED RACE [C02003_001E, C02003_002E, C02003_003E, C02003...

To focus on tables that mention hispanic or not hispanic information:

In [29]:
acs.filter_tables('HISPANIC', by='description')
Out[29]:
description columns
table_name
B03001 HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN [B03001_001E, B03001_002E, B03001_003E, B03001...
B03002 HISPANIC OR LATINO ORIGIN BY RACE [B03002_001E, B03002_002E, B03002_003E, B03002...
B03003 HISPANIC OR LATINO ORIGIN [B03003_001E, B03003_002E, B03003_003E]
B16006 LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK EN... [B16006_001E, B16006_002E, B16006_003E, B16006...
B98013 TOTAL POPULATION COVERAGE RATE BY WEIGHTING RA... [B98013_001E, B98013_002E, B98013_003E, B98013...
B99031 ALLOCATION OF HISPANIC OR LATINO ORIGIN [B99031_001E, B99031_002E, B99031_003E]

Since we see that B03002 looks like a good table to focus on, we can narrow down the variables we are interested in using filter_variables:

In [30]:
acs.filter_variables('B03002')
Out[30]:
attributes concept group label limit predicateOnly predicateType required values
B03002_021E B03002_021M,B03002_021MA,B03002_021EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Two or mo... 0 NaN int NaN NaN
B03002_020E B03002_020M,B03002_020MA,B03002_020EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Two or mo... 0 NaN int NaN NaN
B03002_001E B03002_001M,B03002_001MA,B03002_001EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total 0 NaN int NaN NaN
B03002_005E B03002_005M,B03002_005MA,B03002_005EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Ameri... 0 NaN int NaN NaN
B03002_004E B03002_004M,B03002_004MA,B03002_004EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Black... 0 NaN int NaN NaN
B03002_003E B03002_003M,B03002_003MA,B03002_003EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!White... 0 NaN int NaN NaN
B03002_002E B03002_002M,B03002_002MA,B03002_002EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino 0 NaN int NaN NaN
B03002_009E B03002_009M,B03002_009MA,B03002_009EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Two o... 0 NaN int NaN NaN
B03002_007E B03002_007M,B03002_007MA,B03002_007EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Nativ... 0 NaN int NaN NaN
B03002_008E B03002_008M,B03002_008MA,B03002_008EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Some ... 0 NaN int NaN NaN
B03002_006E B03002_006M,B03002_006MA,B03002_006EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Asian... 0 NaN int NaN NaN
B03002_013E B03002_013M,B03002_013MA,B03002_013EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!White alone 0 NaN int NaN NaN
B03002_012E B03002_012M,B03002_012MA,B03002_012EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino 0 NaN int NaN NaN
B03002_011E B03002_011M,B03002_011MA,B03002_011EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Two o... 0 NaN int NaN NaN
B03002_010E B03002_010M,B03002_010MA,B03002_010EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Not Hispanic or Latino!!Two o... 0 NaN int NaN NaN
B03002_017E B03002_017M,B03002_017MA,B03002_017EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Native Ha... 0 NaN int NaN NaN
B03002_016E B03002_016M,B03002_016MA,B03002_016EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Asian alone 0 NaN int NaN NaN
B03002_015E B03002_015M,B03002_015MA,B03002_015EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!American ... 0 NaN int NaN NaN
B03002_014E B03002_014M,B03002_014MA,B03002_014EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Black or ... 0 NaN int NaN NaN
B03002_018E B03002_018M,B03002_018MA,B03002_018EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Some othe... 0 NaN int NaN NaN
B03002_019E B03002_019M,B03002_019MA,B03002_019EA HISPANIC OR LATINO ORIGIN BY RACE B03002 Estimate!!Total!!Hispanic or Latino!!Two or mo... 0 NaN int NaN NaN

There, we see that the relevant columns are those measuring the full population, the hispanic population, and the not hispanic population:

In [9]:
hispanic = ['B03002_001', # full population 
            'B03002_002', # nonhispanic
            'B03002_012' # hispanic 
           ]

Altogether, grabbing the data for a city can be done using the from_place method:

In [10]:
phoenix = acs.from_place('Phoenix, AZ', variables=hispanic)
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: 
Ring Self-intersection[-12486597.5213 3939710.1975]
  tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Phoenix, AZ to Phoenix city within layer Incorporated Places

With this, we can use contextily to grab a basemap:

In [11]:
phoenix_basemap, phoenix_extent = contextily.bounds2img(*phoenix.total_bounds, zoom=10, 
                                                        url=contextily.tile_providers.ST_TONER_LITE)

And plot the percentage hispanic population:

In [63]:
f,ax = plt.subplots(1,1, figsize=(10,10))
ax.imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc')
phoenix['pct_hispanic'] = phoenix.eval('B03002_012E / B03002_001E')
phoenix.plot('pct_hispanic', cmap='plasma', ax = ax, alpha=.2)
Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9608e8edd8>

Estimating Segregation

To compute segregation in Phoenix for the 2017 five-year ACS, the segregation package takes the dataframe and column names containing the group under study and the total population. For this, you can estimate the Massey-Denton Dissimilarity statistic using the segregation.aspatial.Dissim estimator:

In [64]:
seg_phoenix = segregation.aspatial.Dissim(phoenix, 
                                          group_pop_var='B03002_012E', 
                                          total_pop_var='B03002_001E')

Thus, for 2017, the hispanic/not hispanic dissimilarity index for Phoenix, measured at the census tract level, is:

In [65]:
seg_phoenix.statistic
Out[65]:
0.5004851624821972

While this computes the dissimilarity metric, it does not conduct inference on that value. segregation has a generic testing framework, segregation.inference, that can estimate and re-estimate segregation indices under certain assumptions. Below, we'll compute the segregation of random maps, assuming populations are randomly distributed across the map.

In [66]:
phx_test = segregation.inference.SingleValueTest(seg_phoenix)

Then, we can plot this to compare the segregation in our random Phoenix maps to the Phoenix we did observe in 2017:

In [67]:
phx_test.plot()

Thus, Phoenix's hispanic/not hispanic dissimilarity value is very different from the values we would expect if populations were randomly distributed across the city.

Comparing across time

cenpy exposes ACSs back to 2013. Thus, we can get the earliest ACS data for Phoenix available from the API using:

In [69]:
phoenix_2013 = cenpy.products.ACS(2013).from_place('Phoenix, AZ', variables=hispanic)
phoenix_2013['pct_hispanic'] = phoenix_2013.eval('B03002_012E / B03002_001E')
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: 
Ring Self-intersection[-12486597.5213 3939710.1975]
  tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Phoenix, AZ to Phoenix city within layer Incorporated Places

And, we can compare the spatial distributions visually:

In [78]:
f,ax = plt.subplots(1,3, figsize=(20,10), sharex=True, sharey=True)
[ax_.imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc') for ax_ in ax]
phoenix.plot('pct_hispanic', cmap='plasma', ax = ax[1], alpha=.4)
phoenix_2013.plot('pct_hispanic', cmap='plasma', ax = ax[0], alpha=.4)
phoenix.merge(phoenix_2013.drop('geometry',axis=1), on='GEOID', suffixes=('_2017', '_2013'))\
       .eval('pct_change = (pct_hispanic_2017 - pct_hispanic_2013)/(pct_hispanic_2013)')\
       .plot('pct_change', cmap='bwr_r', ax=ax[2], alpha=.4, vmin=-.5, vmax=.5, legend=True)
f.tight_layout()
ax[0].axis(phoenix.total_bounds[[0,2,1,3]])
ax[0].set_title('Hispanic %, 2013', fontsize=20)
ax[1].set_title('Hispanic %, 2017', fontsize=20)
ax[2].set_title('Relative Change', fontsize=20)
Out[78]:
Text(0.5, 1.0, 'Relative Change')

To compute the segregation index in 2013, we use the same strategy as before:

In [72]:
seg_phoenix_2013 = segregation.aspatial.Dissim(phoenix_2013, 
                                              group_pop_var='B03002_012E', 
                                              total_pop_var='B03002_001E')
In [76]:
seg_phoenix_2013.statistic
Out[76]:
0.5234336505646645

Now, though, with two statistics (one in 2013 and one in 2017), we can compare the two probabilistically using the segregation.inference.TwoValueTest:

In [79]:
time_comparison = segregation.inference.TwoValueTest(seg_phoenix, seg_phoenix_2013)

Subjectively, we saw that the statistics were pretty similar. Objectively, the simulation-based inference confirms this intuition. Our estimated difference suggests that the dissimilarity index dropped slightly (from .52 in 2013 to .5 in 2017). But, this drop is within what we'd expect, given the uncertainty in estimating the two segregation indices. The red line is the estimated difference between the two segregation indices, and the blue histogram shows the distribution of simulated differences, which takes into account our uncertainty:

In [75]:
time_comparison.plot()

Compared to another city?

Cenpy works on any place that's recognized in census places. If we wanted to compare segregation between different cities, we can do this also with cenpy & segregation. For instance, to get Austin, Texas's data from the ACS:

In [80]:
austin = acs.from_place('Austin, TX', variables=hispanic)
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: 
Ring Self-intersection[-10884881.1468 3554135.7868]
  tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Austin, TX to Austin city within layer Incorporated Places

Just like before, we can get basemaps using contextily and make nice maps:

In [81]:
austin_basemap, austin_extent = contextily.bounds2img(*austin.total_bounds, zoom=12, url=contextily.tile_providers.ST_TONER_LITE)
In [94]:
f,ax = plt.subplots(1,2, figsize=(10,10))
ax[0].imshow(austin_basemap, extent=austin_extent, interpolation='sinc')
ax[1].imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc')
austin.eval('pct_hispanic = B03002_012E / B03002_001E').plot('pct_hispanic', cmap='plasma', ax = ax[0], alpha=.4)
phoenix.plot('pct_hispanic', cmap='plasma', ax=ax[1], alpha=.4)
ax[1].axis(phoenix.total_bounds[[0,2,1,3]])
ax[0].set_title('Hispanic % (Austin)', fontsize=20)
ax[1].set_title('Hispanic % (Phoenix)', fontsize=20)
Out[94]:
Text(0.5, 1.0, 'Hispanic % (Phoenix)')