The Planetary Computer provides tabular data in the Apache Parquet file format, which provides a standardized high-performance columnar storage format.
When working from Python, there are several options for reading parquet datasets. The right choice depends on the size and kind of the data you're reading. When reading geospatial data, with one or more columns containing vector geometries, we recommend using geopandas for small datasets and dask-geopandas for large datasets. For non-geospatial tabular data, we recommend pandas for small datasets and Dask for large datasets.
Regardless of which library you're using to read the data, we recommend using STAC to discover which datasets are available, and which options should be provided when reading the data.
In this example we'll work with data from the US Forest Service's Forest Inventory and Analysis dataset. This includes a collection of tables providing information about forest health and location in the United States.
import pystac_client
import planetary_computer
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
fia = catalog.get_collection("fia")
fia
id: fia |
title: Forest Inventory and Analysis |
description: Status and trends on U.S. forest location, health, growth, mortality, and production, from the U.S. Forest Service's [Forest Inventory and Analysis](https://www.fia.fs.fed.us/) (FIA) program. The Forest Inventory and Analysis (FIA) dataset is a nationwide survey of the forest assets of the United States. The FIA research program has been in existence since 1928. FIA's primary objective is to determine the extent, condition, volume, growth, and use of trees on the nation's forest land. Domain: continental U.S., 1928-2018 Resolution: plot-level (irregular polygon) This dataset was curated and brought to Azure by [CarbonPlan](https://carbonplan.org/). |
providers:
|
type: Collection |
item_assets: {'data': {'type': 'application/x-parquet', 'roles': ['data'], 'title': 'Dataset root', 'table:storage_options': {'account_name': 'cpdataeuwest'}}} |
table:tables: [{'name': 'Survey Table', 'description': 'Survey table. This table contains one record for each year an inventory is conducted in a State for annual inventory or one record for each periodic inventory. * SURVEY.CN = PLOT.SRV_CN links the unique inventory record for a State and year to the plot records.', 'msft:item_name': 'survey'}, {'name': 'County Table', 'description': 'County table. This table contains survey unit codes and is also a reference table for the county codes and names. * COUNTY.CN = PLOT.CTY_CN links the unique county record to the plot record.', 'msft:item_name': 'county'}, {'name': 'Plot Table', 'description': 'Plot table. This table provides information relevant to the entire 1-acre field plot. This table links to most other tables, and the linkage is made using PLOT.CN = TABLE_NAME.PLT_CN (TABLE_NAME is the name of any table containing the column name PLT_CN). Below are some examples of linking PLOT to other tables. * PLOT.CN = COND.PLT_CN links the unique plot record to the condition class record(s). * PLOT.CN = SUBPLOT.PLT_CN links the unique plot record to the subplot records. * PLOT.CN = TREE.PLT_CN links the unique plot record to the tree records. * PLOT.CN = SEEDLING.PLT_CN links the unique plot record to the seedling records.', 'msft:item_name': 'plot'}, {'name': 'Condition Table', 'description': 'Condition table. This table provides information on the discrete combination of landscape attributes that define the condition (a condition will have the same land class, reserved status, owner group, forest type, stand-size class, regeneration status, and stand density). * PLOT.CN = COND.PLT_CN links the condition class record(s) to the plot table. * COND.PLT_CN = SITETREE.PLT_CN and COND.CONDID = SITETREE.CONDID links the condition class record to the site tree data. * COND.PLT_CN = TREE.PLT_CN and COND.CONDID = TREE.CONDID links the condition class record to the tree data.', 'msft:item_name': 'cond'}, {'name': 'Subplot Table', 'description': 'Subplot table. This table describes the features of a single subplot. There are multiple subplots per 1-acre field plot and there can be multiple conditions sampled on each subplot. * PLOT.CN = SUBPLOT.PLT_CN links the unique plot record to the subplot records. * SUBPLOT.PLT_CN = COND.PLT_CN and SUBPLOT.MACRCOND = COND.CONDID links the macroplot conditions to the condition class record. * SUBPLOT.PLT_CN = COND.PLT_CN and SUBPLOT.SUBPCOND = COND.CONDID links the subplot conditions to the condition class record. * SUBPLOT.PLT_CN = COND.PLT_CN and SUBPLOT.MICRCOND = COND.CONDID links the microplot conditions to the condition class record.', 'msft:item_name': 'subplot'}, {'name': 'Subplot Condition Table', 'description': 'Subplot condition table. This table contains information about the proportion of a subplot in a condition. * PLOT.CN = SUBP_COND.PLT_CN links the subplot condition class record to the plot table. * SUBP_COND.PLT_CN = COND.PLT_CN and SUBP_COND.CONDID = COND.CONDID links the condition class records found on the four subplots to the subplot description.', 'msft:item_name': 'subp_cond'}, {'name': 'boundary', 'description': 'Boundary table. This table provides a description of the demarcation line between two conditions that occur on a single subplot', 'msft:item_name': 'boundary'}, {'name': 'Subplot Condition Change Matrix', 'description': 'Subplot condition change matrix table. This table contains information about the mix of current and previous conditions that occupy the same area on the subplot. * PLOT.CN = SUBP_COND_CHNG_MTRX.PLT_CN links the subplot condition change matrix records to the unique plot record. * PLOT.PREV_PLT_CN = SUBP_COND_CHNG_MTRX.PREV_PLT_CN links the subplot condition change matrix records to the unique previous plot record.', 'msft:item_name': 'subp_cond_chng_mtrx'}, {'name': 'Tree Table', 'description': 'Tree table. This table provides information for each tree 1 inch in diameter and larger found on a microplot, subplot, or core optional macroplot. * PLOT.CN = TREE.PLT_CN links the tree records to the unique plot record. * COND.PLT_CN = TREE.PLT_CN and COND.CONDID = TREE.CONDID links the tree records to the unique condition record.', 'msft:item_name': 'tree'}, {'name': 'Tree Woodland Stems Table', 'description': 'Tree woodland stems table. This table stores data for the individual stems of a woodland species tree. Individual woodland stem diameter measurements contribute to the calculation of the diameter stored on the parent TREE table record. * TREE.CN = TREE_WOODLAND_STEMS.TRE_CN links a woodland stems record to the corresponding unique tree record.', 'msft:item_name': 'tree_woodland_stems'}, {'name': 'Tree Regional Biomass Table', 'description': 'Tree regional biomass table. This table contains biomass estimates computed using equations and methodology that varies by FIA work unit. This table retains valuable information for generating biomass estimates that match earlier published reports. * TREE.CN = TREE_REGIONAL_BIOMASS.TRE_CN links a tree regional biomass record to the corresponding unique tree.', 'msft:item_name': 'tree_regional_biomass'}, {'name': 'Tree Net Growth, Removal, and Mortality Component Table', 'description': 'Tree net growth, removal, and mortality component table. This table stores information used to compute net growth, removals, and mortality estimates for remeasurement trees. Each remeasurement tree has a single record in this table. * TREE_GRM_COMPONENT.TRE_CN = TREE.TRE_CN links the records in this table to the corresponding tree record in the TREE table.', 'msft:item_name': 'tree_grm_component'}, {'name': 'Tree Net Growth, Removal, and Mortality Midpoint Table', 'description': 'Tree net growth, removal, and mortality midpoint table. This table contains information about a remeasured tree at the midpoint of the remeasurement period. It does not contain a record for every tree. Midpoint estimates are computed for trees that experience mortality, removal, or land use diversion or reversion. The information in this table is used to compute net growth, removal, and mortality estimates on remeasurement trees. * TREE_GRM_MIDPT.TRE_CN = TREE.TRE_CN links the records in this table to the corresponding tree record in the TREE table.', 'msft:item_name': 'tree_grm_midpt'}, {'name': 'Tree Net Growth, Removal, and Mortality Begin Table', 'description': 'Tree net growth, removal, and mortality begin table. This table contains information for remeasured trees where values have been calculated for the beginning of the remeasurement period. Only those trees where information was recalculated for time 1 (T1) are included. The information in this table is used to produce net growth, removal and mortality estimates on remeasured trees. * TREE_GRM_BEGIN.TRE_CN = TREE.TRE_CN links the records in this table to the corresponding tree record in the TREE table.', 'msft:item_name': 'tree_grm_begin'}, {'name': 'Tree Net Growth, Removal, and Mortality Estimation Table', 'description': 'Tree net growth, removal, and mortality estimation table. This table contains information used to produce estimates of growth, removals and mortality. * PLOT.CN = TREE_GRM_ESTN.PLT_CN links the tree GRM estimation records to the unique plot record. * TREE.CN = TREE_GRM_ESTN.TRE_CN links the tree GRM estimation records to the unique tree record.', 'msft:item_name': 'tree_grm_estn'}, {'name': 'Seedling Table', 'description': 'Seedling table. This table provides a count of the number of live trees of a species found on a microplot that are less than 1 inch in diameter but at least 6 inches in length for conifer species or at least 12 inches in length for hardwood species. * PLOT.CN = SEEDLING.PLT_CN links the seedling records to the unique plot record. * COND.PLT_CN = SEEDLING.PLT_CN and COND.CONDID = SEEDLING.CONDID links the condition record to the seedling record.', 'msft:item_name': 'seedling'}, {'name': 'Site Tree Table', 'description': 'Site tree table. This table provides information on the site tree(s) collected in order to calculate site index and/or site productivity information for a condition. * PLOT.CN = SITETREE.PLT_CN links the site tree records to the unique plot record. * SITETREE.PLT_CN = COND.PLT_CN and SITETREE.CONDID = COND.CONDID links the site tree record(s) to the unique condition class record.', 'msft:item_name': 'sitetree'}, {'name': 'Invasive Subplot Species Table', 'description': 'Invasive subplot species table. This table provides percent cover data of invasive species identified on the subplot. * PLOT.CN = INVASIVE_SUBPLOT_SPP.PLT_CN links the invasive subplot species record(s) to the unique plot record. * SUBP_COND.PLT_CN = INVASIVE_SUBPLOT_SPP.PLT_CN and SUBP_COND.CONDID = INVASIVE_SUBPLOT_SPP.CONDID and SUBP_COND.SUBP = INVASIVE_SUBPLOT_SPP.SUBP links the invasive subplot species record(s) to the unique subplot condition record. * INVASIVE_SUBPLOT_SPP.VEG_SPCD = REF_PLANT_DICTIONARY.SYMBOL links the invasive vegetation subplot NRCS species code to the plant dictionary reference species code.', 'msft:item_name': 'invasive_subplot_spp'}, {'name': 'Phase 2 Vegetation Subplot Species Table', 'description': 'Phase 2 Vegetation subplot species table. This table provides percent cover data of vegetation species identified on the subplot. * PLOT.CN = P2VEG_SUBPLOT_SPP.PLT_CN links the vegetation subplot species record(s) to the unique plot record. * SUBP_COND.PLT_CN = P2VEG_SUBPLOT_SPP.PLT_CN and SUBP_COND.CONDID = P2VEG_SUBPLOT_SPP.CONDID and SUBP_COND.SUBP = P2VEG_SUBPLOT_SPP.SUBP links the vegetation subplot species record(s) to the unique subplot condition record. * P2VEG_SUBPLOT_SPP.VEG_SPCD = REF_PLANT_DICTIONARY.SYMBOL links the P2 vegetation subplot NRCS species code to the plant dictionary reference species code.', 'msft:item_name': 'p2veg_subplot_spp'}, {'name': 'Phase 2 Vegetation Subplot Structure Table', 'description': 'Phase 2 Vegetation subplot structure table. This table provides percent cover by layer by growth habit. * PLOT.CN = P2VEG_SUBP_STRUCTURE. PLT_CN links the subplot structure record(s) to the unique plot record. * SUBP_COND.PLT_CN = P2VEG_SUBP_STRUCTURE.PLT_CN and SUBP_COND.CONDID = P2VEG_SUBP_STRUCTURE.CONDID and SUBP_COND.SUBP = P2VEG_SUBP_STRUCTURE.SUBP links the vegetation subplot structure record(s) to the unique subplot condition record.', 'msft:item_name': 'p2veg_subp_structure'}, {'name': 'Down Woody Material Visit Table', 'description': 'Down woody material visit table. This table provides general information on down woody material indicator visit, such as the date of the DWM survey. * PLOT.CN = DWM_VISIT.PLT_CN links the down woody material indicator visit record to the unique plot record.', 'msft:item_name': 'dwm_visit'}, {'name': 'Down Woody Material Coarse Woody Debris Table', 'description': 'Down woody material coarse woody debris table. This table provides information for each piece of coarse woody debris measured along the transects. * PLOT.CN = DWM_COARSE_WOODY_DEBRIS.PLT_CN links the down woody material coarse woody debris records to the unique plot record. * COND.PLT_CN = DWM_COARSE_WOODY_DEBRIS.PLT_CN and COND.CONDID= DWM_COARSE_WOODY_DEBRIS.CONDID links the coarse woody debris records to the unique condition record.', 'msft:item_name': 'dwm_coarse_woody_debris'}, {'name': 'Down Woody Material Duff, Litter, Fuel Table', 'description': 'Down woody material duff, litter, fuel table. This table provides information on the duff, litter, fuelbed depths measured at a point on the transects. * PLOT.CN = DWM_DUFF_LITTER_FUEL.PLT_CN links the duff, litter, fuelbed records to the unique plot record. * COND.PLT_CN = DWM_DUFF_LITTER_FUEL.PLT_CN and COND.CONDID= DWM_DUFF_LITTER_FUEL.CONDID links the duff, litter, fuel records to the unique condition record.', 'msft:item_name': 'dwm_duff_litter_fuel'}, {'name': 'Down Woody Material Fine Woody Debris Table', 'description': 'Down woody material fine woody debris table. This table provides information on the fine woody debris measured along a segment of the transects. * PLOT.CN = DWM_FINE_WOODY_DEBRIS.PLT_CN links the fine woody debris records to the unique plot record. * COND.PLT_CN = DWM_FINE_WOODY_DEBRIS.PLT_CN and COND.CONDID= DWM_FINE_WOODY_DEBRIS.CONDID links the fine woody debris records to the unique condition record.', 'msft:item_name': 'dwm_fine_woody_debris'}, {'name': 'Down Woody Material Microplot Fuel Table', 'description': 'Down woody material microplot fuel table. This table provides information on the fuel loads (shrubs and herbs) measured on the microplot. * PLOT.CN = DWM_MICROPLOT_FUEL.PLT_CN links the microplot fuel records to the unique plot record.', 'msft:item_name': 'dwm_microplot_fuel'}, {'name': 'Down Woody Material Residual Pile Table', 'description': 'Down woody material residual pile table. This table provides information on the wood piles measured on the subplot. * PLOT.CN = DWM_RESIDUAL_PILE.PLT_CN links the wood piles records to the unique plot record. * COND.PLT_CN = DWM_RESIDUAL_PILE.PLT_CN and COND.CONDID= DWM_RESIDUAL_PILE.CONDID links the wood piles records to the unique condition record.', 'msft:item_name': 'dwm_residual_pile'}, {'name': 'Down Woody Material Transect Segment Table', 'description': 'Down woody material transect segment table. This table describes the down woody material transect segment lengths by condition class. * PLOT.CN = DWM_TRANSECT_SEGMENT.PLT_CN links the down woody material transect length records to the unique plot record. * COND.PLT_CN = DWM_TRANSECT_SEGMENT.PLT_CN and COND.CONDID= DWM_TRANSECT_SEGMENT.CONDID links the down woody material transect segment records to the unique condition record.', 'msft:item_name': 'dwm_transect_segment'}, {'name': 'Condition Down Woody Material Calculation Table', 'description': 'Condition down woody material calculation table. This table contains calculated values and condition-level estimates for down woody attributes by plot number (PLOT), condition class number (CONDID), and evaluation identifier (EVALID). * PLOT.CN = COND_DWM_CALC.PLT_CN links the down woody material calculation records to the unique plot record. * COND.CN = COND_DWM_CALC.CND_CN links the down woody material calculation records to the unique condition record. * POP_STRATUM. CN = COND_DWM_CALC.STRATUM_CN links the down woody material calculation records to the unique population stratum record.', 'msft:item_name': 'cond_dwm_calc'}, {'name': 'Plot Regeneration Table', 'description': 'Plot regeneration table. This table contains the information for the four subplots describing the amount of animal browse pressure exerted on the regeneration of trees. * PLOT.CN = PLOT_REGEN.PLT_CN links the unique plot record to the unique plot regeneration record.', 'msft:item_name': 'plot_regen'}, {'name': 'Subplot Regeneration Table', 'description': 'Subplot regeneration table. This table provides information on the subplot survey status and the site survey limitations, if any, for the tree regeneration study. * PLOT.CN = SUBPLOT_REGEN.PLT_CN links the unique plot record to the subplot regeneration records. * SUBPLOT.PLT_CN = SUBPLOT_REGEN.PLT_CN and SUBPLOT.SUBP = SUBPLOT_REGEN.SUBP links the subplot record to the subplot regeneration record.', 'msft:item_name': 'subplot_regen'}, {'name': 'Seedling Regeneration Table', 'description': 'Seedling regeneration. This table contains provides information on the seedling count by condition, species, source, and length class for the tree regeneration study. * PLOT.CN = SEEDLING_REGEN.PLT_CN links the unique plot record to the seedling regeneration records. * COND.PLT_CN = SEEDLING_REGEN.PLT_CN and COND.CONDID = SEEDLING_REGEN.CONDID links the regeneration seedling records to the unique condition record.', 'msft:item_name': 'seedling_regen'}, {'name': 'Population Estimation Unit Table', 'description': 'Population estimation unit table. This table contains information about estimation units. An estimation unit is a geographic area that can be drawn on a map. It has a known area, and the sampling intensity must be the same within a stratum within an estimation unit. Generally, estimation units are contiguous areas, but exceptions are made when certain ownerships, usually National Forests, are sampled at different intensities. One record in the POP_ESTN_UNIT table corresponds to a single estimation unit. POP_ESTN_UNIT.CN = POP_STRATUM.ESTN_UNIT_CN links the unique stratified geographical area (ESTN_UNIT) to the strata (STRATUMCD) that are assigned to each ESTN_UNIT.', 'msft:item_name': 'pop_estn_unit'}, {'name': 'Population Evaluation Table', 'description': 'Population evaluation table. This table provides information about evaluations. An evaluation is the combination of a set of plots (the sample) and a set of Phase 1 data (obtained through remote sensing, called a stratification) that can be used to produce population estimates for a State (an evaluation may be created to produce population estimates for a region other than a State, such as the Black Hills National Forest). A record in the POP_EVAL table identifies one evaluation and provides some descriptive information about how the evaluation may be used. * POP_ESTN_UNIT.EVAL_CN = POP_EVAL.CN links the unique evaluation identifier (EVALID) in the POP_EVAL table to the unique geographical areas (ESTN_UNIT) that are stratified. Within a population evaluation (EVALID) there can be multiple population estimation units, or geographic areas across which there are a number of values being estimated (e.g., estimation of volume across counties for a given State).', 'msft:item_name': 'pop_eval'}, {'name': 'Population Evaluation Attribute Table', 'description': 'Population evaluation attribute table. This table provides information as to which population estimates can be provided by an evaluation. If an evaluation can produce only 22 of all the population estimates in the REF_POP_ATTRIBUTE table, there will be 22 records in the POP_EVAL_ATTRIBUTE table (one per population estimate) for that evaluation. * POP_EVAL.CN = POP_EVAL_ATTRIBUTE.EVAL_CN links the unique evaluation identifier to the list of population estimates that can be derived for that evaluation.', 'msft:item_name': 'pop_eval_attribute'}, {'name': 'Population Evaluation Group Table', 'description': 'Population evaluation group table. This table lists and describes the evaluation groups. One record in the POP_EVAL_GRP table can be linked to all the evaluations that were used in generating estimates for a State inventory report. * POP_EVAL_GRP.CN = POP_EVAL_TYP.EVAL_GRP_CN links the evaluation group record to the evaluation type record.', 'msft:item_name': 'pop_eval_grp'}, {'name': 'Population Evaluation Type Table', 'description': 'Population evaluation type table. This table provides information on the type of evaluations that were used to generate a set of tables for an inventory report. In a typical State inventory report, one evaluation is used to generate an estimate of the total land area; a second evaluation is used to generate current estimates of volume, numbers of trees and biomass; and a third evaluation is used for estimating growth, removals and mortality. * POP_EVAL_TYP.EVAL_CN = POP_EVAL.CN links the evaluation type record to the evaluation record. * POP_EVAL_TYP.EVAL_GRP_CN = POP_EVAL_GRP.CN links the evaluation type record to the evaluation group record. * POP_EVAL_TYP.EVAL_TYP = REF_POP_EVAL_TYP_DESCR.EVAL_TYP links an evaluation type record to an evaluation type description reference record.', 'msft:item_name': 'pop_eval_typ'}, {'name': 'Population Plot Stratum Assignment Table', 'description': "Population plot stratum assignment table. This table provides a way to assign stratum information to a plot. Stratum information is assigned to a plot by overlaying the plot's location on the Phase 1 imagery. Plots are linked to their appropriate stratum for an evaluation via the POP_PLOT_STRATUM_ASSGN table. * POP_PLOT_STRATUM_ASSGN.PLT_CN = PLOT.CN links the stratum assigned to the plot record.", 'msft:item_name': 'pop_plot_stratum_assgn'}, {'name': 'Population Stratum Table', 'description': 'Population stratum table. This table provides information about individual strata. The area within an estimation unit is divided into strata. The area for each stratum can be calculated by determining the proportion of Phase 1 pixels/plots in each stratum and multiplying that proportion by the total area in the estimation unit. Information for a single stratum is stored in a single record of the POP_STRATUM table. * POP_STRATUM.CN = POP_PLOT_STRATUM_ASSGN.STRATUM_CN links the defined stratum to each plot.', 'msft:item_name': 'pop_stratum'}, {'name': 'Plot Geometry Table', 'description': 'Plot geometry table. This table contains geometric attributes associated with the plot location, such as the hydrological unit and roadless codes. * PLOTGEOM.CN = PLOT.CN links the unique plot record between the two tables.', 'msft:item_name': 'plotgeom'}, {'name': 'Plot Snapshot Table', 'description': 'Plot snapshot table. This table combines the information in the PLOT table with information in the PLOT_EVAL_GRP and POP_STRATUM tables to provide a snapshot of the plot records with their associated expansion and adjustment factors. * PLOTSNAP.CN = PLOT.CN links the unique plot record between the two tables.', 'msft:item_name': 'plotsnap'}] |
msft:container: cpdata |
msft:storage_account: cpdataeuwest |
msft:short_description: Status and trends on U.S. forest location, health, growth, mortality, and production, from the U.S. Forest Service's Forest Inventory and Analysis (FIA) program |
msft:region: westeurope |
https://stac-extensions.github.io/item-assets/v1.0.0/schema.json |
https://stac-extensions.github.io/table/v1.2.0/schema.json |
id: tree_woodland_stems |
bbox: [-179.14734, -14.53, 179.77847, 71.352561] |
datetime: 2020-06-01T00:00:00Z |
table:columns: [{'name': 'CN', 'type': 'int64', 'description': 'Sequence number'}, {'name': 'PLT_CN', 'type': 'int64', 'description': 'Plot sequence number'}, {'name': 'INVYR', 'type': 'int64', 'description': 'Inventory year'}, {'name': 'STATECD', 'type': 'int64', 'description': 'State code'}, {'name': 'UNITCD', 'type': 'int64', 'description': 'Survey unit code'}, {'name': 'COUNTYCD', 'type': 'int64', 'description': 'County code'}, {'name': 'PLOT', 'type': 'int64', 'description': 'Plot number'}, {'name': 'SUBP', 'type': 'int64', 'description': 'Subplot number'}, {'name': 'TREE', 'type': 'int64', 'description': 'Woodland tree number'}, {'name': 'TRE_CN', 'type': 'int64', 'description': 'Tree sequence number'}, {'name': 'DIA', 'type': 'double', 'description': 'Woodland stem diameter'}, {'name': 'STATUSCD', 'type': 'int64', 'description': 'Woodland stem status code'}, {'name': 'STEM_NBR', 'type': 'int64', 'description': 'Woodland stem number'}, {'name': 'CYCLE', 'type': 'int64', 'description': 'Inventory cycle number'}, {'name': 'SUBCYCLE', 'type': 'int64', 'description': 'Inventory subcycle number'}, {'name': 'CREATED_BY', 'type': 'double', 'description': 'Created by'}, {'name': 'CREATED_DATE', 'type': 'byte_array', 'description': 'Created date'}, {'name': 'CREATED_IN_INSTANCE', 'type': 'int64', 'description': 'Created in instance'}, {'name': 'MODIFIED_BY', 'type': 'double', 'description': 'Modified by'}, {'name': 'MODIFIED_DATE', 'type': 'byte_array', 'description': 'Modified date'}, {'name': 'MODIFIED_IN_INSTANCE', 'type': 'double', 'description': 'Modified in instance'}] |
https://stac-extensions.github.io/table/v1.2.0/schema.json |
href: abfs://cpdata/raw/fia/tree_woodland_stems.parquet |
type: application/x-parquet |
title: Dataset root |
roles: ['data'] |
owner: tree_woodland_stems |
table:storage_options: {'account_name': 'cpdataeuwest', 'credential': 'st=2023-11-06T12%3A37%3A27Z&se=2023-11-14T12%3A37%3A27Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-11-07T12%3A37%3A26Z&ske=2023-11-14T12%3A37%3A26Z&sks=b&skv=2021-06-08&sig=9BmPmCxJ42hl3pdH/X9uNBV2gfZPnXtqpMit5NtmMO4%3D'} |
rel: collection |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/fia |
type: application/json |
rel: parent |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/fia |
type: application/json |
rel: root |
href: https://planetarycomputer.microsoft.com/api/stac/v1 |
type: application/json |
title: Microsoft Planetary Computer STAC API |
rel: self |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/fia/items/tree_woodland_stems |
type: application/geo+json |
rel: items |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/fia/items |
type: application/geo+json |
rel: parent |
href: https://planetarycomputer.microsoft.com/api/stac/v1/ |
type: application/json |
rel: root |
href: https://planetarycomputer.microsoft.com/api/stac/v1 |
type: application/json |
title: Microsoft Planetary Computer STAC API |
rel: self |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/fia |
type: application/json |
rel: license |
href: https://www.fs.usda.gov/rds/archive/datauseinfo/open |
type: text/html |
title: USDA Open Access Data Use Agreement |
rel: describedby |
href: https://planetarycomputer.microsoft.com/dataset/fia |
type: text/html |
title: Human readable dataset overview and reference |
href: https://www.fia.fs.fed.us/library/database-documentation/current/ver80/FIADB%20User%20Guide%20P2_8-0.pdf |
type: application/pdf |
title: Database Description and User Guide |
roles: ['metadata'] |
owner: fia |
href: https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/fia.png |
type: image/gif |
title: Forest Inventory and Analysis |
owner: fia |
href: abfs://items/fia.parquet |
type: application/x-parquet |
title: GeoParquet STAC items |
description: Snapshot of the collection's STAC items exported to GeoParquet format. |
roles: ['stac-items'] |
owner: fia |
msft:partition_info: {'is_partitioned': False} |
table:storage_options: {'account_name': 'pcstacitems', 'credential': 'st=2023-11-06T12%3A37%3A26Z&se=2023-11-14T12%3A37%3A27Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-11-07T12%3A37%3A25Z&ske=2023-11-14T12%3A37%3A25Z&sks=b&skv=2021-06-08&sig=Jro/tMUnVUMjGSWkF72oiRb6lKlgmiKDIlw%2B4U5Lxs4%3D'} |
The FIA Collection has a number of items, each of which represents a different table stored in Parquet format.
list(fia.get_all_items())
[<Item id=tree_woodland_stems>, <Item id=tree_regional_biomass>, <Item id=tree_grm_midpt>, <Item id=tree_grm_estn>, <Item id=tree_grm_component>, <Item id=tree_grm_begin>, <Item id=tree>, <Item id=survey>, <Item id=subplot_regen>, <Item id=subplot>, <Item id=subp_cond_chng_mtrx>, <Item id=subp_cond>, <Item id=sitetree>, <Item id=seedling_regen>, <Item id=seedling>, <Item id=pop_stratum>, <Item id=pop_plot_stratum_assgn>, <Item id=pop_eval_typ>, <Item id=pop_eval_grp>, <Item id=pop_eval_attribute>, <Item id=pop_eval>, <Item id=pop_estn_unit>, <Item id=plotsnap>, <Item id=plot_regen>, <Item id=plotgeom>, <Item id=plot>, <Item id=p2veg_subp_structure>, <Item id=p2veg_subplot_spp>, <Item id=invasive_subplot_spp>, <Item id=dwm_visit>, <Item id=dwm_transect_segment>, <Item id=dwm_residual_pile>, <Item id=dwm_microplot_fuel>, <Item id=dwm_fine_woody_debris>, <Item id=dwm_duff_litter_fuel>, <Item id=dwm_coarse_woody_debris>, <Item id=county>, <Item id=cond_dwm_calc>, <Item id=cond>, <Item id=boundary>]
To load a single table, get it's item and extract the href
from the data
asset. The "boundary" table, which provides information about subplots, is relatively small and doesn't contain a geospatial geometry column, so it can be read with pandas.
import pandas as pd
import planetary_computer
boundary = fia.get_item(id="boundary")
asset = boundary.assets["data"]
df = pd.read_parquet(
asset.href,
storage_options=asset.extra_fields["table:storage_options"],
columns=["CN", "AZMLEFT", "AZMCORN"],
)
df.head()
CN | AZMLEFT | AZMCORN | |
---|---|---|---|
__null_dask_index__ | |||
0 | 204719190010854 | 259 | 0 |
1 | 204719188010854 | 33 | 0 |
2 | 204719189010854 | 52 | 0 |
3 | 204719192010854 | 322 | 0 |
4 | 204719191010854 | 325 | 0 |
There are a few imporant pieces to highlight
href
and the storage_options
. All we needed to know was the ID of the Collection and Itemcolumns
keyword.Larger datasets can be read using Dask. For example, the cpdata/raw/fia/tree.parquet
folder contains about 160 individual Parquet files, totalling about 22 million rows. In this case, pass the path to the directory to dask.dataframe.read_parquet
.
import dask.dataframe as dd
tree = fia.get_item(id="tree")
asset = tree.assets["data"]
df = dd.read_parquet(
asset.href,
storage_options=asset.extra_fields["table:storage_options"],
columns=["SPCD", "CARBON_BG", "CARBON_AG"],
engine="pyarrow",
)
df
SPCD | CARBON_BG | CARBON_AG | |
---|---|---|---|
npartitions=160 | |||
int64 | float64 | float64 | |
... | ... | ... | |
... | ... | ... | ... |
... | ... | ... | |
... | ... | ... |
That lazily loads the data into a Dask DataFrame. We can operate on the DataFrame with pandas-like methods, and call .compute()
to get the result. In this case, we'll compute the average amount of carbon sequestered above and below ground for each tree, grouped by species type. To cut down on execution time we'll select just the first partition.
result = df.get_partition(0).groupby("SPCD").mean().compute() # group by species
result
CARBON_BG | CARBON_AG | |
---|---|---|
SPCD | ||
43 | 37.864937 | 165.753430 |
67 | 3.549734 | 14.679764 |
68 | 9.071253 | 39.108406 |
107 | 19.321549 | 84.096184 |
110 | 29.964395 | 130.956288 |
... | ... | ... |
973 | 4.632913 | 22.658887 |
975 | 38.988846 | 202.220124 |
976 | 25.385733 | 130.583668 |
993 | 12.570365 | 64.712301 |
999 | 5.072137 | 25.477134 |
100 rows × 2 columns
The us-census
collection has some items that include a geometry
column, and so can be loaded with geopandas
. All parquet datasets hosted by the Planetary Computer with one or more geospatial columns use the geoparquet standard for encoding the geospatial metadata.
import geopandas
item = catalog.get_collection("us-census").get_item("2020-cb_2020_us_state_500k")
asset = item.assets["data"]
df = geopandas.read_parquet(
asset.href, storage_options=asset.extra_fields["table:storage_options"]
)
df.head()
STATEFP | STATENS | AFFGEOID | GEOID | STUSPS | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 66 | 1802705 | 0400000US66 | 66 | GU | Guam | 00 | 543555847 | 934337453 | MULTIPOLYGON (((144.64538 13.23627, 144.64716 ... |
1 | 48 | 1779801 | 0400000US48 | 48 | TX | Texas | 00 | 676680588914 | 18979352230 | MULTIPOLYGON (((-94.71830 29.72885, -94.71721 ... |
2 | 55 | 1779806 | 0400000US55 | 55 | WI | Wisconsin | 00 | 140292246684 | 29343721650 | MULTIPOLYGON (((-86.95617 45.35549, -86.95463 ... |
3 | 44 | 1219835 | 0400000US44 | 44 | RI | Rhode Island | 00 | 2677759219 | 1323691129 | MULTIPOLYGON (((-71.28802 41.64558, -71.28647 ... |
4 | 36 | 1779796 | 0400000US36 | 36 | NY | New York | 00 | 122049520861 | 19256750161 | MULTIPOLYGON (((-72.03683 41.24984, -72.03496 ... |
With this, we can visualize the boundaries for the continental United States.
import contextily
drop = ["GU", "AK", "MP", "HI", "VI", "PR", "AS"]
ax = df[~df.STUSPS.isin(drop)].plot()
contextily.add_basemap(
ax, crs=df.crs.to_string(), source=contextily.providers.Esri.NatGeoWorldMap
)
This quickstart briefly introduced how to access tabular data on the Planetary Computer. For more, see