Pandas is a core scientific Python library to work with "Panel Data" (PanDas). Basically if you have a spreadsheet or database you should be using Pandas. Pandas has many input/output (I/O) functions, and two core data structures - the "Series" and "DataFrame". Geopandas extends Pandas to work efficently with collections of geographic Vector data - geometric shapes that are georeferenced to a position on Earth's surface. Geopandas data objects are, you might have guessed, called "GeoSeries" and "GeoDataFrame".
#These libraries are mature, but constantly improving, so it's always good to keep track of the version:
import pandas as pd
import geopandas as gpd
print('Pandas version: ', pd.__version__)
print('Geopandas version: ', gpd.__version__)
We'll use the Smithsonian Global Volcanism database. This could be a local csv, excel file, sql database etc... or remote data or results from a server (https://volcano.si.edu/database/webservices.cfm)
# Load csv results from server into a Pandas DataFrame
server = 'https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?'
query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=csv'
df = pd.read_csv(server+query)
print(type(df))
df.head()
# Use the dataframe indexing to extract subsets
df.iloc[2:5]
# Query a column for a value of interest
df.query('Volcano_Name == "Shasta"')
# Pandas is all about efficient data access and visualization
# Here are just a few examples
df.Last_Eruption_Year.describe()
df.Region.unique()
df.groupby('Region').Last_Eruption_Year.describe()
# Save the results of your analysis
results = df.groupby('Region').Last_Eruption_Year.describe()
results.to_csv('last_eruption_year_stats.csv')
df.Elevation.plot.hist()
df.groupby('Region').Volcano_Name.count().sort_values().plot.barh()
Since the Volcano database has geolocation information we should consider visualizing information on a map!
# Now load query results as json directly in geopandas
query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=json'
gf = gpd.read_file(server+query)
print(type(gf))
gf.head()
# NOTE this looks the same as the dataframe from before,
# but it is actual a 'geodataframe' with a specified coordinate reference system (crs)
print(type(gf))
print(gf.crs)
# The same indexing and operations work with geodataframes
gf.iloc[2]
# But now we have a variety of spatial operations at our disposal
# Subsetting is very easy in Geopandas. Often we only want points in a certain bounding box
ymin, ymax, xmin, xmax = [45, 49, -120, -124]
subset = gf.cx[xmin:xmax, ymin:ymax]
subset
# Geopandas by default plots latitude and longitude of each entry (row) in a table
subset.plot()
# Maybe we want to get a polygon that encloses all those points
# Geopandas uses shapely under the surface
import shapely
point_collection = shapely.geometry.MultiPoint(subset.geometry.tolist())
polygon = point_collection.convex_hull
polygon
# We can convert that polygon to a new CRS easily with geopandas
# For example, convert to UTM to get area in units of square meters
# https://spatialreference.org/ref/epsg/wgs-84-utm-zone-10n/
# EPSG:32610
gfShape = gpd.GeoDataFrame(geometry=[polygon], crs = {'init': 'epsg:4326'})
gfShape
print(f'Polygon area km^2')
area = gfShape.to_crs(epsg=32610).area * 1e-6
area
# Save shape as geospatial vector format for GIS software
myshape = gfShape.to_crs(epsg=32610)
myshape.to_file('myshape.gpkg', driver='GPKG')
# Finally, let's say you have a different polygon and want to extract all the volcanoes in it
# This is referred to a 'spatial join' http://geopandas.org/mergingdata.html
# gpd has some built-in datasets from the natural earth project https://www.naturalearthdata.com
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world
# Get volcanoes of Colombia
colombia = world.query('name == "Colombia"')
colombia
colombian_volcanoes = gpd.sjoin(gf, colombia, how="inner", op='within')
colombian_volcanoes
import geoviews as gv
import hvplot.pandas
print('Geoviews version: ', gv.__version__)
print('hvplot version: ', hvplot.__version__)
# Geoviews offers many basemaps
tiles = gv.tile_sources.StamenTerrain()
tiles
# hvplot makes it easy to plot dataframes or geodataframes
volcano_names = gf.loc[:,['Volcano_Name','geometry']]
points = volcano_names.hvplot(geo=True, hover_cols=['Volcano_Name'], frame_width=600)
points
# Combining data in geoviews is done like so:
tiles * points