#!/usr/bin/env python # coding: utf-8 # ## United States 2020 Census data # # In this tutorial, you will explore various subsets of geographic data in the [United States 2020 Census](https://www.census.gov/programs-surveys/decennial-census/decade/2020/2020-census-main.html). You will learn how the data is structured and how to access the data. # # The data consists of two groups: # 1. **Census data by census block boundaries**: This group contains data for each census block, grouped by US counties. Census blocks are the smallest available unit made available by the US Census Bureau. This group also includes population totals broken down by census block, available in a second table. # 2. **Census data by cartographic boundaries**: This group contains data that is aggregated by 29 different geographic boundaries, ranging from block groups to regional datasets such as metropolitan divisions and states. This group does not include any demographic information. # # This notebook contains details about all geographic datasets within the 2020 census data. Use the table of contents to jump to any section relevant to you. For a brief introduction, [Accessing US Census data with the Planetary Computer STAC API](../datasets/us-census/us-census-example.ipynb). # # ### Table of Contents # # * [Accessing the Data](#Accessing-the-Data) # * [Import Dependencies](#Import-Dependencies) # * [Census data by census block boundaries](#Census-data-by-census-block-boundaries) # * [Census Block Boundaries](#Census-Block-Boundaries) (8,180,866 features) # * [Census data by cartographic boundaries](#Census-data-by-cartographic-boundaries) # * [American Indian Area Geographies](#American-Indian-Area-Geographies) # * [American Indian/Alaska Native Areas/Hawaiian Home Lands](#American-Indian/Alaska-Native-Areas/Hawaiian-Home-Lands-(AIANNH)) (704 features) # * [American Indian Tribal Subdivisions](#American-Indian-Tribal-Subdivisions-(AITSN)) (484 features) # * [Alaska Native Regional Corporations](#Alaska-Native-Regional-Corporations-(ANRC)) (12 features) # * [Tribal Block Groups](#Tribal-Block-Groups-(TBG)) (934 features) # * [Tribal Census Tracts](#Tribal-Census-Tracts-(TTRACT)) (492 features) # * [Census Block Groups](#Census-Block-Groups-(BG)) (242,305 features) # * [Census Tracts](#Census-Tracts-(TRACT)) (85,190 features) # * [Congressional Districts](#Congressional-Districts:-116th-Congress-(CD116)) (441 features) # * [Consolidated Cities](#Consolidated-Cities-(CONCITY)) (8 features) # * [Counties](#Counties-(COUNTY)) (3,234 features) # * [Counties within Congressional Districts](#Counties-within-Congressional-Districts:-116th-Congress-(COUNTY_within_CD116)) (3,836 features) # * [County Subdivisions](#County-Subdivisions-(COUSUB)) (36,502 features) # * [Divisions](#Divisions-(DIVISION)) (9 features) # * [Metropolitan and Micropolitan Statistical Areas and Related Statistical Areas](#Metropolitan-and-Micropolitan-Statistical-Areas-and-Related-Statistical-Areas) # * [Core Based Statistical Areas](#Core-Based-Statistical-Areas-(CBSA)) (939 features) # * [Combined Statistical Areas](#Combined-Statistical-Areas-(CSA)) (175 features) # * [Metropolitan Divisions](#Metropolitan-Divisions-(METDIV)) (31 features) # * [New England City and Town Areas](#New-England-City-and-Town-Areas-(NECTA)) (40 features) # * [New England City and Town Areas Division](#New-England-City-and-Town-Areas-Division(NECTADIV)) (11 features) # * [Combined New England City and Town Areas](#Combined-New-England-City-and-Town-Areas-(CNECTA)) (7 features) # * [Places](#Places-(PLACE)) (32,188 features) # * [Regions](#Regions-(REGION)) (4 features) # * [School Districts](#School-Districts) # * [Elementary](#School-Districts---Elementary-(ELSD)) (1,945 features) # * [Secondary](#School-Districts---Secondary-(SCSD)) (473 features) # * [Unified](#School-Districts---Unified-(UNSD)) (10,867 features) # * [State Legislative Districts](#State-Legislative-Districts) # * [Lower Chamber](#State-Legislative-Districts---Lower-Chamber-(SLDL)) (4,829 features) # * [Upper Chamber](#State-Legislative-Districts---Upper-Chamber-(SLDU)) (1,958 features) # * [States](#States-(STATE)) (56 features) # * [Subbarrios](#Subbarrios-(SUBBARRIO)) (145 features) # * [United States Outline](#United-States-Outline) (1 feature) # * [Voting Districts](#Voting-Districts-(VTD)) (158,320 features) # # ### Accessing the Data # # Like other datasets on the Planetary Computer, US Census datasets are cataloged using [STAC](https://stacspec.org/). Each table, corresponding a particular level of cartographic aggregation, is available as a STAC item under the `us-census` collection. # In[1]: import pystac_client import planetary_computer catalog = pystac_client.Client.open( "https://planetarycomputer.microsoft.com/api/stac/v1/", modifier=planetary_computer.sign_inplace, ) census = catalog.get_collection("us-census") census # The actual files themselves are stored as [Apache Parquet](https://parquet.apache.org/) datasets in Azure Blob Storage. These files can be loaded with pandas or geopandas, or dask-geopandas if the files are larger than memory. # # Loading each of the tables will follow the same basic pattern: # # 1. Get the Item from the collection with `census.get_item(item_id)` # 2. Use the `href` and `table:storage_options` fields from the `data` asset to load the data with `read_parquet`. # ### Import Dependencies # # We'll import a few libraries to use for accessing and plotting the data. In particular,[geopandas](https://geopandas.org/) and [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to load the parquet datasets and [contextily](https://github.com/geopandas/contextily). Before getting started, make sure you have these two dependencies installed and imported: # In[2]: import geopandas import dask_geopandas import contextily as ctx import planetary_computer # ### Census data by census block boundaries # # The first block of data is organized by [Census blocks](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_5). Census blocks are the smallest geographic grouping available in the current dataset. There are over eight million census blocks, resulting in large datasets. To facilitate parallelism and accessing subsets of the data, the Census block-level data are partitioned by state. # # There are two tables available at the Census block level: "geo", containing the boundaries and other data about the block, and "population", containing the population counts in that geometry by various features. # # **geo** # # * GEOID = Concatenation of county FIPS code, census tract code, and census block number. *In pandas and Dask, this is used as the index*. # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * TRACTCE = Census Tract code # * BLOCKCE = Census Block code # * ALAND = Current land area # * AWATER = Current water area # * INTPTLAT = Current latitude of the internal point # * INTPTLON = Current longitude of the internal point # * geometry = Coordinates for block polygons # In[3]: asset = census.get_item("2020-census-blocks-geo").assets["data"] geo = dask_geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"], calculate_divisions=True, ) geo # **pop** # # The population table contains may columns. Two are important to call out: # # * GEOID = Concatenation of county FIPS code, census tract code, and census block number. *In pandas and Dask, this is used as the index*. # * P0010001 = Total Block Population # # The remainder of the columns provide the Block's population faceted by various features. [This document (pdf)](https://www2.census.gov/programs-surveys/decennial/2010/technical-documentation/complete-tech-docs/summary-file/nsfrd.pdf) documents the meaning of all the additional variables. # In[4]: import dask.dataframe asset = census.get_item("2020-census-blocks-population").assets["data"] pop = dask.dataframe.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"], calculate_divisions=True, ) pop # In[6]: ri = geo.get_partition(39).compute() # The datasets use `GEOID` as their index and are partitioned by state, so we can use the FIPS codes [efficiently access subsets](https://docs.dask.org/en/latest/dataframe-best-practices.html#use-the-index) of the data. # In[7]: ax = ri.to_crs(epsg=3857).plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Census Blocks: Rhode Island", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # Both the geo and population tables use `GEOID` as a unique identifier, so the geometries and population data can be joined together. Remember that population data are not available for territories, so we'll use an inner join. # In[8]: ri = ( geo.get_partition(39) .compute() .join(pop[["P0010001"]].get_partition(39).compute(), how="inner") ) ri = ri[ri.P0010001 > 10] # Now lets plot the census blocks in Providence County with at least 150 people. # In[9]: providence = ri[(ri.P0010001 >= 150) & (ri.COUNTYFP == "007")] ax = providence.to_crs(epsg=3857).plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Census Blocks with Population Greater than 150: Providence County, RI", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # You can use this method with any census block within any county in the US. # # **[Jump to Top](#United-States-2020-Census-data)** # ### Census data by cartographic boundaries # # The second block of data is organized by different cartographic categories. These boundaries cover larger areas than the individual census blocks discussed [above](#Census-data-by-census-block-boundaries). The different categories range from census block groups (consisting of several census blocks) all the way up to a National Boundary file (encompassing the entire USA). # # The files in this second group tend to be smaller in size than the census block data in the first group. Therefore, the files in the second group are not partitioned into multiple files, and each dataset only consists of one parquet file. Another difference is that the datasets in this second group include different information than the census block files in the first group and do not contain population statistics. Which additional data is included differs from dataset to dataset. See [Appendix E. in the 2020 TIGER/Line Technical Documentation](https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2020/TGRSHP2020_TechDoc.pdf) for more details on the available feature classes. # # The following sections are examples of how to access and view each cartographic boundary file in this second group of data. Each example uses the same basic workflow and dependencies as the [Census Block Boundaries](#Census-data-by-census-block-boundaries) for the first group of data. An important thing to note when using this data is that before plotting the data onto a basemap, the datasets need to be converted to Web Mercator [(EPSG 3857)](https://epsg.io/3857) using the [to_crs](https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.to_crs.html) function of [GeoPandas](https://geopandas.org/). # # Some of the datasets are grouped together based on their type. It is important to note that some of the files may have gaps where no relevant data exists because states with no Tribal Block Groups do not have any Tribal Block Group data. The header for each example also includes the relevant abbreviation used for data access and retrieval. # # ### American Indian Area Geographies # # American Indian Area Geographies is the first grouping of cartographic boundary files available. # # #### American Indian/Alaska Native Areas/Hawaiian Home Lands (AIANNH) # # This file contains data for legal and statistical [American Indian/Alaska Native Areas/Hawaiian Home Lands (AIANNH)](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_1) entities published by the US Census Bureau. # # The attribute table contains the following information: # # * AIANNHCE = AIANNH census code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = concatenation of AIANNH census code and reservation/statistical area or off-reservation trust land Hawaiian home land indicator # * NAME = Current Area Name # * NAMELSAD = Current name and legal/statistical status for each entity # * LSAD = Current legal/statistical area code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for AIANNH polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the AIANNH data from the parquet file of the Planetary Computer dataset: # In[10]: asset = census.get_item("2020-cb_2020_us_aiannh_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and apply a basemap. Rather than displaying the whole dataset, show only the Apache Choctaw American Indian Homeland by selecting a subset of the dataframe where the values in the `NAME` column match `"Apache Choctaw"`. # In[11]: ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "Apache Choctaw"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Apache Choctaw American Indian Home Land", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # This map shows the shape of the Apache Choctaw American Indian Homeland overlaid on a Stamen Terrain Style basemap. To display the entire dataset, remove the part of the code that limits the dataframe to only the Apache Choctaw homeland. To plot a different area, change the dataframe's filter to a different attribute or value. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### American Indian Tribal Subdivisions (AITSN) # # This file contains data on [American Indian Tribal Subdivisions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_1). These areas are the legally defined subdivisions of American Indian Reservations (AIR), Oklahoma Tribal Statistical Areas (OTSA), and Off-Reservation Trust Land (ORTL). # # The attribute table contains the following information: # # * AIANNHCE = AIANNH census code # * TRSUBCE = Current AITSN census code # * TRSUBNS = ANSI feature code for American Indian Tribal Subdivision # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = concatenation of AIANNH census code and AITSN census code # * NAME = Current Area Name # * NAMELSAD = Current name and legal/statistical AITSN description # * LSAD = Current legal/statistical area code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for AITSN polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the AIANNH data from the parquet file of the Planetary Computer dataset: # In[12]: asset = census.get_item("2020-cb_2020_us_aitsn_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. # In[13]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf.plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "American Indian Tribal Subdivisions", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows all the American Indian Tribal Subdivisions. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Alaska Native Regional Corporations (ANRC) # # This file contains data on [Alaska Native Regional Corporations](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_1), which are corporations created according to the Alaska Native Claims Settlement Act. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * ANRCFP = ANRC FIPS code # * ANRCNS = ANSI feature code for Alaska Native Regional Corporation # * AFFGEOID American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = concatenation of state FIPS code and ANRC FIPS Code # * NAME = Current area name # * NAMELSAD = Current name and legal/statistical area description # * LSAD = Legal/statistical area description code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for ANRC polygon # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the ANRC data from the parquet file of the Planetary Computer dataset: # In[14]: asset = census.get_item("2020-cb_2020_02_anrc_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. To make the data work better with the Mercator Projection, exclude part of the dataset from the plot. To do so, limit your dataframe to rows that do not include `"Aleut"` in the AIANNHCE column. # In[15]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME != "Aleut"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Alaska Native Regional Corporations (excluding Aleut)", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows all the Alaskan Native Regional Corporations except for Aleut. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Tribal Block Groups (TBG) # # This file includes data on [Tribal Block Groups](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_26), which are subdivisions of Tribal Census Tracts. These block groups can extend over multiple AIRs and ORTLs due to areas not meeting Block Group minimum population thresholds. # # The attribute table contains the following information: # # * AIANNHCE = AIANNH census code # * TTRACTCE = Tribal Census Tract Code # * TBLKGPCE = Tribal Block Group letter # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = concatenation of AIANNH census code, trivial census tract code, and tribal block group letter # * NAMELSAD = Current legal/statistical description and tribal block group letter # * LSAD = Current legal/statistical area code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Block Group polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Tribal Block Group data from the parquet file of the Planetary Computer dataset: # In[16]: asset = census.get_item("2020-cb_2020_us_tbg_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for Tribal Block Group A by filtering by the TBLKGPCE column. Due to block group population threshold minimums, Tribal Block Group A spans a large portion of the contiguous United States and is not fully connected. # In[17]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.TBLKGPCE == "A"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Tribal Block Group A", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Tribal Block Group A. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Tribal Census Tracts (TTRACT) # # This file includes data on [Tribal Census Tracts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_27) which are relatively small statistical subdivisions of AIRs and ORTLs defined by federally recognized tribal government officials in partnership with the Census Bureau. Due to population thresholds, the Tracts may consist of multiple non-contiguous areas. # # The attribute table contains the following information: # # * AIANNHCE = AIANNH census code # * TTRACTCE = Tribal Census Tract Code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = concatenation of AIANNH census code and tribal census tract code # * NAME = Tribal Census Tract name # * NAMELSAD = Current legal/statistical description and tribal census tract name # * LSAD = Current legal/statistical area code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Tribal Census Tract polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Tribal Block Group data from the parquet file of the Planetary Computer dataset: # In[18]: asset = census.get_item("2020-cb_2020_us_ttract_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for Tribal Census Tract T002 by filtering by the NAME column. Due to census tract population threshold minimums, Tribal Census Tract T002 spans a large portion of the contiguous United States and is not fully connected. # In[19]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "T002"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Tribal Census Tract T002", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Tribal Census Tract T002. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Census Block Groups (BG) # # This file contains data on [Census Block Groups](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_4). These groups are the second smallest geographic grouping. They consist of clusters of blocks within the same census tract that share the same first digit of their 4-character census block number. Census Block Groups generally contain between 600 and 3,000 people and generally cover contiguous areas. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * TRACTCE = Census tract code # * BLKGRPCE = Block group number # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS, County FIPS, Census tract code, and block group number # * NAME = Block group number # * NAMELSAD = Legal/statistical description and group number # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Block Group polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Block Group data from the parquet file of the Planetary Computer dataset: # In[20]: asset = census.get_item("2020-cb_2020_us_bg_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for the state of California by filtering by the State FIPS code (`"06"`) in the STATEFP column. # In[21]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STATEFP == "06"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Census Block Groups: California", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows all the Census Block Groups in California. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Census Tracts (TRACT) # # This file contains data on [Census Tracts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_13) which are small and relatively permanent statistical subdivisions of a county or equivalent entity. Tract population size is generally between 1,200 and 8,000 people with an ideal size of 4,000. Boundaries tend to follow visible and identifiable features and are usually contiguous areas. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * TRACTCE = Census tract code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS, County FIPS, and Census tract code # * NAME = Census Tract name, it is the census tract code converted to an integer # * NAMELSAD = Legal/statistical description and tract name # * STUSPS = FIPS State Postal Code # * NAMELSADCO = County name # * STATE_NAME = State Name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Census Tract polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Census Tract data from the parquet file of the Planetary Computer dataset: # In[22]: asset = census.get_item("2020-cb_2020_us_tract_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for Census Tracts located in Washington, DC by filtering for `"DC"` in the STUSPS column. # In[23]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "DC"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Census Tracts: Washington, DC", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows all the Census Tracts in Washington, DC. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Congressional Districts: 116th Congress (CD116) # # This file contains data on the [Congressional Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_9) for the 116th Congress. # # The attribute table contains the following information: # # * STATEFP = State FIPS Code # * CD116FP = Congressional District FIPS code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS and congressional district FIPS code # * NAMELSAD = Legal/statistical description and name # * LSAD = Legal/statistical classification # * CDSESSN = Congressional Session Code # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Congressional District polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Congressional District data from the parquet file of the Planetary Computer dataset: # In[24]: asset = census.get_item("2020-cb_2020_us_cd116_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for Maryland"s 2nd Congressional District by filtering by the State FIPS code `24` and the Congressional District FIPS code `02` in the GEOID column. # In[25]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.GEOID == "2402"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "2nd Congressional District: Maryland", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Maryland"s 2nd Congressional District. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Consolidated Cities (CONCITY) # # This file contains data on [Consolidated Cities](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_8). These are areas where one or several other incorporated places in a county or Minor Civil Division are included in a consolidated government but still exist as separate legal entities. # # The attribute table contains the following information: # # * STATEFP = State FIPS Code # * CONCTYFP = Consolidated city FIPS code # * CONCTYNS = Consolidated city GNIS code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS and consolidated city FIPS code # * NAME = Consolidated city name # * NAMELSAD = Name and Legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Consolidated City polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Consolidated City data from the parquet file of the Planetary Computer dataset: # In[26]: asset = census.get_item("2020-cb_2020_us_concity_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for Athens-Clarke County, GA, which is a Consolidated City. Select the data by filtering by the NAME column. # In[27]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "Athens-Clarke County"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "Consolidated City: Athens-Clarke County, GA", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Athens-Clarke County, GA. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Counties (COUNTY) # # This file contains data on [Counties and Equivalent Entities](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_12). These are the primary legal divisions of states. Most states use the term "counties," but other terms such as "Parishes," "Municipios," or "Independent Cities" may be used. # # The attribute table contains the following information: # # * STATEFP = State FIPS Code # * COUNTYFP = County FIPS code # * COUNTNS = ANSI feature code for the county # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS and county FIPS code # * NAME = County name # * NAMELSAD = Name and Legal/statistical description # * STUSPS = FIPS State Postal Code # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for County polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Counties and Equivalent Entities data from the parquet file of the Planetary Computer dataset: # In[28]: asset = census.get_item("2020-cb_2020_us_county_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data for counties in Minnesota by filtering by the STATE_NAME column. # In[29]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STATE_NAME == "Minnesota"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Minnesota: Counties", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Minnesota Counties. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Counties within Congressional Districts: 116th Congress (COUNTY_within_CD116) # # This file contains data on Counties within Congressional Districts. # # The attribute PARTFLG identifies whether all or only part of a County is within a Congressional District: # # * N = All of a County is within a Congressional District # * Y = only part of a county is within a Congressional District # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * CD116FP = Congressional District FIPS code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS, Congressional District FIPS, and county FIPS code # * PARTFLD = Identifies if all or part of entity is within the file # * ALAND = Current Land Area # * geometry = coordinates for polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Counties within Congressional Districts data from the parquet file of the Planetary Computer dataset: # In[30]: asset = census.get_item("2020-cb_2020_us_county_within_cd116_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only polygons where only part of the County is within a Congressional District. Select the relevant data by filtering by `"Y"` in the PARTFLG column. # In[31]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.PARTFLG == "Y"].plot(figsize=(20, 10), alpha=0.5, edgecolor="k") ax.set_title( "Counties only partially within a Congressional Districts", fontdict={"fontsize": "30", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Counties partially within Congressional Districts. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### County Subdivisions (COUSUB) # # This file contains [County Subdivisions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_11), which are the primary divisions of counties and equivalent entities. These divisions vary from state to state and include Barrios, Purchases, Townships, and other types of legal and statistical entities. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * COUSUBFP = Subdivision FIPS code # * COUSUBNS = ANSI feature for the subdivision # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS, county FIPS, and county subdivision FIPS # * NAME = Subdivision name # * NAMELSAD = Subdivision name and legal/statistical description # * STUSPS = FIPS State Postal Code # * NAMELSADCO = County name # * STATE_NAME = State Name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for County Subdivision polygons # # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the County Subdivisions data from the parquet file of the Planetary Computer dataset: # In[32]: asset = census.get_item("2020-cb_2020_us_cousub_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot all the subdivisions, townships in this case, in Bergen County, NJ. Select the relevant data by filtering by the NAMELSADCO column. # In[33]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAMELSADCO == "Bergen County"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "County Subdivisions: Bergen County, NJ", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows County Subdivisions in Bergen County, NJ. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Divisions (DIVISION) # # This file contains data on [Divisions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_10) of the US. This file is similar to the Regions file but contains more divisions and encompasses several states per division. # # The attribute table contains the following information: # # * DIVISIONCE = Number assigned to each division # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = DIVISIONCE # * NAME = Name of division # * NAMELSAD = Division name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Division polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Division data from the parquet file of the Planetary Computer dataset: # In[34]: asset = census.get_item("2020-cb_2020_us_division_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select only data from the Mountain division by filtering by the NAME column. # In[35]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "Mountain"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Divisions: Mountain Region", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Mountain Region Division. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Metropolitan and Micropolitan Statistical Areas and Related Statistical Areas # # [Metropolitan and Micropolitan Statistical Areas and Related Statistical Areas](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7) is the second grouping of datasets within the census data by cartographic boundaries group. A metropolitan or micropolitan statistical area contains a core area, with a substantial population with adjacent communities having a high degree of economic and social integration with that core. This grouping contains six datasets. # # #### Core Based Statistical Areas (CBSAs) # # This file contains data on [Core Based Statistical Areas (CBSAs)](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7). This encompasses all metropolitan and micropolitan statistical areas. # # The attribute table contains the following information: # # * CSAFP = Combined statistical area code (if applicable) # * CBSAFP = Metropolitan statistical area/micropolitan statistical area code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = CBSAFP # * NAME = Metropolitan statistical area/micropolitan statistical area name # * NAMELSAD = CBSA name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for CBSA polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the CBSA data from the parquet file of the Planetary Computer dataset: # In[36]: asset = census.get_item("2020-cb_2020_us_cbsa_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select Kahului-Wailuku-Lahaina, HI Metro Area by filtering by the NAME column. # In[37]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "Kahului-Wailuku-Lahaina, HI"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "Core Based Statistical Area: Kahului-Wailuku-Lahaina, HI Metro Area", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Kahului-Wailuku-Lahaina, HI Metro Area, Core Based Statistical Area. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Combined Statistical Areas (CSA) # # This file contains data on [Combined Statistical Areas](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7), which are areas that consist of two or more adjacent CBSAs that have significant employment interchanges. # # The attribute table contains the following information: # # * CSAFP = Combined statistical area code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = CSAFP # * NAME = CSA Name # * NAMELSAD = CSA name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for CSA polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the CSA data from the parquet file of the Planetary Computer dataset: # In[38]: asset = census.get_item("2020-cb_2020_us_csa_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select the San Jose-San Francisco-Oakland CSA by filtering by the NAME column. # In[39]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "San Jose-San Francisco-Oakland, CA"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "Combined Statistical Area: San Jose-San Francisco-Oakland, CA", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the San Jose-San Francisco-Oakland, CA Combined Statistical Area. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Metropolitan Divisions (METDIV) # # This file contains data on [Metropolitan Divisions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7). These areas are groupings of counties or equivalent entities within a metropolitan statistical area with a core of 2.5 million inhabitants and one or more main counties that represent employment centers, plus adjacent counties with commuting ties. # # The attribute table contains the following information: # # * CSAFP = Combined statistical area code # * CBSAFP = Metropolitan statistical area/micropolitan statistical area code # * METDIVFP = Metropolitan division code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of CBSAFP and METDIVFP # * NAME = Metropolitan division name # * NAMELSAD = MetDiv name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Metropolitan Division polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Metropolitan Division data from the parquet file of the Planetary Computer dataset: # In[40]: asset = census.get_item("2020-cb_2020_us_metdiv_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select out Chicago-Naperville-Evanston, IL Metropolitan Divisions by filtering by the NAME column. # In[41]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "Chicago-Naperville-Evanston, IL"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "Metropolitan Divisions: Chicago-Naperville-Evanston, IL", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Metropolitan Divisions in Illinois and Indiana. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### New England City and Town Areas (NECTA) # # This file contains [New England City and Town Areas](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7), which encompass metropolitan and micropolitan statistical areas and urban clusters in New England. # # The attribute table contains the following information: # # * CNECTAFP = Combined NECTA code # * NECTAFP = NECTA code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = NECTAFP # * NAME = NECTA name # * NAMELSAD = NECTA name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for New England City and Town Area polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the New England City and Town Areas data from the parquet file of the Planetary Computer dataset: # In[42]: asset = census.get_item("2020-cb_2020_us_necta_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot all polygons from this parquet file and overlay all of the New England City and Town Areas on a basemap. # In[43]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf.plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "New England City and Town Areas", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows New England City and Town Areas. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### New England City and Town Area Division (NECTADIV) # # This file contains [New England City and Town Areas Divisions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7), which are smaller groupings of cities and towns in New England that contain a single core of 2.5 million inhabitants. Each division must have a total population of 100,000 or more. # # The attribute table contains the following information: # # * CNECTAFP = Combined NECTA code # * NECTAFP = NECTA code # * NCTADVFP = NECTA Division code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of NECTA code and NECT division code # * NAME = NECTA Division name # * NAMELSAD = NECTA Division name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for New England City and Town Area Division polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the New England City and Town Areas Divisions data from the parquet file of the Planetary Computer dataset: # In[44]: asset = census.get_item("2020-cb_2020_us_nectadiv_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot all polygons from this parquet file and overlay all of the New England City and Town Area Divisions on a basemap. # In[45]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf.plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "New England City and Town Area Divisions", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows New England City and Town Area Divisions. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### Combined New England City and Town Areas (CNECTA) # # This file contains data on [Combined New England City and Town Areas](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_7), consisting of two or more adjacent NECTAs that have significant employment interchanges. # # The attribute table contains the following information: # # * CNECTAFP = Combined NECTA code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = CNECTA # * NAME = Combined NECTA name # * NAMELSAD = Combined NECTA name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Combined New England City and Town Area polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Combined New England City and Town Areas data from the parquet file of the Planetary Computer dataset: # In[46]: asset = census.get_item("2020-cb_2020_us_cnecta_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot all polygons from this parquet file and overlay all of the Combined New England City and Town Areas on a basemap. # In[47]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf.plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Combined New England City and Town Areas", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Combined New England City and Town Areas. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Places (PLACE) # # This file contains [Places](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_14) which are Incorporated Places (legal entities) and Census Designated Places (CDPs, statistical entities). An incorporated place usually is a city, town, village, or borough but can have other legal descriptions. CDPs are settled concentrations of population that are identifiable by name but are not legally incorporated. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * PLACEFP = Place FIPS code # * PLACENS = Place GNIS code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and Place FIPS code # * NAME = Place name # * NAMELSAD = Place name and legal/statistical description # * STUSPS = FIPS Postal code # * STATE_NAME = State name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Place polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Places data from the parquet file of the Planetary Computer dataset: # In[48]: asset = census.get_item("2020-cb_2020_us_place_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select all the Places in Washington State by filtering by `"WA"` in the STUSPS column. # In[49]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "WA"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Places: Washington State", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Places in Washington State. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Regions (REGION) # # This file contains [Regions](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_10) of the US and encompasses several states per division. # # The attribute table contains the following information: # # * REGIONCE = Number assigned to each Region # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = REGIONCE # * NAME = Name of region # * NAMELSAD = Region name and legal/statistical description # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Region polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Regions data from the parquet file of the Planetary Computer dataset: # In[50]: asset = census.get_item("2020-cb_2020_us_region_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select the entire South Region by filtering by the NAME column. # In[51]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "South"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "South Region", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the South Region. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### School Districts # # [School Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_23) is the third grouping of datasets within the census data by cartographic boundaries group. This dataset grouping includes district boundaries for Elementary School Districts, Secondary School Districts, and Unified School Districts. # # #### School Districts - Elementary (ELSD) # # This file contains [Elementary School Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_23), referring to districts with elementary schools. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * ELSDLEA = Elementary School District local education agency code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and ELSDLEA code # * NAME = Elementary School District name # * STUSPS = FIPS Postal code # * STATE_NAME = State name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Elementary School District polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Elementary School Districts data from the parquet file of the Planetary Computer dataset: # In[52]: asset = census.get_item("2020-cb_2020_us_elsd_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select all the Elementary School Districts in Montana by filtering by `"MT"` in the STUSPS column. # In[53]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "MT"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Montana Elementary School Districts", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Montana Elementary School Districts. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### School Districts - Secondary (SCSD) # # This file contains [Secondary School Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_23), referring to districts with secondary schools. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * SCDLEA = Secondary School District local education agency code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and SCSDLEA code # * NAME = Secondary School District name # * STUSPS = FIPS Postal code. STATE_NAME = State name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = Coordinates for Secondary School District polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Secondary School Districts data from the parquet file of the Planetary Computer dataset: # In[54]: asset = census.get_item("2020-cb_2020_us_scsd_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select all the Secondary School Districts in Arizona by filtering by `"AZ"` in the STUSPS column. # In[55]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "AZ"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Arizona Secondary School Districts", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Arizona Secondary School Districts. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### School Districts - Unified (UNSD) # # This file contains [Unified School Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_23), referring to districts that provide education to children of all school ages. Unified school districts can have both secondary and elementary schools. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * UNSDLEA = Unified School District local education agency code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and UNSDLEA code # * NAME = Unified School District name # * STUSPS = FIPS Postal code # * STATE_NAME = State name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = Coordinates for Unified School District polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Unified School Districts data from the parquet file of the Planetary Computer dataset: # In[56]: asset = census.get_item("2020-cb_2020_us_unsd_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, select the entire New York City Unified School District, which encompasses the five counties of NYC. Select the relevant data by filtering by the NAME column. # In[57]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME == "New York City Department Of Education"].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "New York City Unified School District", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the New York City Unified School District. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### State Legislative Districts # # [State Legislative Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_24) is the fourth grouping of datasets within the census data by cartographic boundaries group. # This dataset grouping includes State Legislative Districts for both Upper and Lower State Chambers. These are areas in which voters elect a person to represent them in state or equivalent entity legislatures. Most states have both upper and lower chambers, the exceptions being Nebraska which has a unicameral legislature, and Washington, DC, which has a single council. As a result, there is no lower house data for Nebraska and DC. # # #### State Legislative Districts - Lower Chamber (SLDL) # # # # This file contains [Lower Chamber State Legislative Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_24). # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * SLDLST = State Legislative District Lower Chamber code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and SLDLST # * NAME = District Name # * NAMELSAD = District name and legal/statistical description # * STUSPS = FIPS Postal code # * STATE_NAME = State name # * LSAD = Legal/statistical classification # * LSY = Legislative Session Year # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Lower Chamber polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Lower Chamber State Legislative Districts data from the parquet file of the Planetary Computer dataset: # In[58]: asset = census.get_item("2020-cb_2020_us_sldl_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot all the Lower Chamber State Legislative Districts in Texas by filtering by `"TX"` in the STUSPS column. # In[59]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "TX"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "State Legislative Districts: Texas, Lower Chamber", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Texas Lower Chamber State Legislative Districts. # # **[Jump to Top](#United-States-2020-Census-data)** # # #### State Legislative Districts - Upper Chamber (SLDU) # # This file contains [Upper Chamber State Legislative Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_24). # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * SLDUST = State Legislative District Upper Chamber code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code and SLDUST # * NAME = District Name # * NAMELSAD = District name and legal/statistical description # * STUSPS = FIPS Postal code # * STATE_NAME = State name # * LSAD = Legal/statistical classification # * LSY = Legislative Session Year # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Upper Chamber polygon # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Upper Chamber State Legislative Districts data from the parquet file of the Planetary Computer dataset: # In[60]: asset = census.get_item("2020-cb_2020_us_sldu_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot all the Upper Chamber State Legislative Districts in Michigan by filtering by `"MI"` in the STUSPS column. # In[61]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.STUSPS == "MI"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "State Legislative Districts: Michigan, Upper Chamber", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Michigan Upper Chamber State Legislative Districts. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### States (STATE) # # This file contains the [US States and State Equivalent Entities](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_25). Within Census Bureau datasets, the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, the Commonwealth of the Northern Mariana Islands, Guam, and the US Virgin Islands) are treated as statistical equivalents of states alongside the 50 US states. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * STATENS = State ANSI feature code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = STATEFP # * STUSPS = FIPS postal code # * NAME = State name # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for State polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the US States and State Equivalent Entities data from the parquet file of the Planetary Computer dataset: # In[62]: asset = census.get_item("2020-cb_2020_us_state_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot only the contiguous US, Puerto Rico, and the US Virgin Islands. To exclude parts of the dataset from plotting, use `~df.STATEFP.isin()` to exclude the STATEFP codes for Alaska, Hawaii, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands. # In[63]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[~ddf.STATEFP.isin(["02", "15", "60", "66", "69"])].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "States: Contiguous US, Puerto Rico, & USVI", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Contiguous US, Puerto Rico, and the US Virgin Islands. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Subbarrios (SUBBARRIO) # # This file contains [Subbarrios](https://www.census.gov/programs-surveys/geography/about/glossary.html#pr), which are legally defined subdivisions of Minor Civil Division in Puerto Rico. They don"t exist within every Minor Civil Division and don"t always cover the entire Minor Civil Division where they do exist. # # The attribute table contains the following information: # # * STATEFP = State FIPS code # * COUNTYFP = County FIPS code # * COUSUBFP = County Subdivision FIPS code # * SUBMCDFP = Subbarrio FIPS code # * SUBMCDNS = Subbarrio ANSI feature code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * GEOID = Concatenation of State FIPS code, County FIPS code, County Subdivision FIPS code and Subbarrio FIPS code # * NAME = Subbarrio name # * NAMELSAD = Subbarrio name and legal/statistical division # * LSAD = Legal/statistical classification # * ALAND = Current land area # * AWATER = Current water area # * geometry = coordinates for Subbarrio polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Subbarrios data from the parquet file of the Planetary Computer dataset: # In[64]: asset = census.get_item("2020-cb_2020_72_subbarrio_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot all the Subbarrios in the San Juan Municipo (the county equivalent for Puerto Rico). Select the relevant data by filtering by the COUNTYFP column. # In[65]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.COUNTYFP == "127"].plot(figsize=(10, 10), alpha=0.5, edgecolor="k") ax.set_title( "Subbarrios: San Juan, Puerto Rico", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the Subbarrios in San Juan Municipo. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### United States Outline # # This file contains the [United States Outline](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_30) shapefile. This contains all 50 US states plus the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, the Commonwealth of the Northern Mariana Islands, Guam, and the US Virgin Islands). There is only one feature within this dataset. # # The attribute table for this dataset only contains the AFFGEOID, GEOID, NAME, and coordinates for the US polygon. # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the United States Outline data from the parquet file of the Planetary Computer dataset: # In[66]: asset = census.get_item("2020-cb_2020_us_nation_5m").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the entire data from this parquet file and overlay it on a basemap. Since this dataset contains only one feature, there are no options to select or exclude specific parts based on attributes. # In[67]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf.plot(figsize=(30, 60), alpha=0.5, edgecolor="k") ax.set_title( "United States and US Overseas Territories", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows the United States and US Overseas Territories. # # **[Jump to Top](#United-States-2020-Census-data)** # # ### Voting Districts (VTD) # # This file contains all [US Voting Districts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_31), which are geographic features established by state, local and tribal governments to conduct elections. # # The attribute table contains the following information: # # * STATEFP20 = State FIPS code # * COUNTYFP20 = County FIPS code # * VTDST20 = Voting district code # * AFFGEOID = American FactFinder summary level code + geovariant code + "00US" + GEOID # * EOID = Concatenation of State FIPS code, County FIPS code, and Voting District code # * VTDI20 = Voting district indicator # * NAME20 = Voting district name # * NAMELSAD20 = Voting district name and legal/statistical division # * LSAD20 = Legal/statistical classification # * ALAND20 = Current land area # * AWATER20 = Current water area # * geometry = coordinates for Voting District polygons # # Use the [read_parquet](https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html) function of [Dask-GeoPandas](https://github.com/geopandas/dask-geopandas) to read the Voting Districts data from the parquet file of the Planetary Computer dataset: # In[68]: asset = census.get_item("2020-cb_2020_us_vtd_500k").assets["data"] ddf = geopandas.read_parquet( asset.href, storage_options=asset.extra_fields["table:storage_options"] ) ddf.head() # Next, plot the data from this parquet file and overlay it on a basemap. For this example, plot all the Voting Districts in Salt Lake City, UT by filtering by Voting District Names that begin with `"Salt Lake"` in the NAME20 column. # In[69]: ddf.crs = 4326 ddf = ddf.to_crs(epsg=3857) ax = ddf[ddf.NAME20.str.startswith("Salt Lake")].plot( figsize=(10, 10), alpha=0.5, edgecolor="k" ) ax.set_title( "Salt Lake City Voting Districts", fontdict={"fontsize": "20", "fontweight": "2"}, ) ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap) ax.set_axis_off() # The map created shows Salt Lake City Voting Districts. # # **[Jump to Top](#United-States-2020-Census-data)**