Notebook

Tips to extract data from a geojson dict to define a `choroplethmapbox` chart¶

The choroplethmapbox chart type is available starting with the Plotly version 4.1.0. It is defined by a geojson file and eventually a pandas dataframe. The dict jdata read from the geojson file must have the following structure:

jdata = {"type": "FeatureCollection",
         "features": []
        }

where jdata['features'] is a list of features, i.e. a list of dicts that contain at least the keys: ['type', 'geometry']. There exist more than one definition of each feature in jdata['features'], because a geojson file has "an open standard format", i.e. "there is no single definition and interpretations vary with usage" Wikipedia.

Most geojson files provide for each feature a subdict 'properties'. A go.Choroplethmapbox chart is defined by a geojson file whose 'geometry' feature is of type 'Polygon' or 'Multipolygon'.

A go.Choroplethmapbox trace has as basic attributes: the geojson data (the dict jdata), the locations, i.e. a list of ids, representing each feature['geometry'] to be colored according to the numerical values given in a list or dataframe column as z. These z-values are usually provided by another sources, not by the geojson file. z can contain population, unemployment percent, etc for a geographi unit.

locations can be the entire list of all feature['geometry] identifiers, read from the geojson file or only a sublist.

An important key that makes a correspondence between each feature['geometry'] (geographical unit) and the data file associated to these units is a key that uniquely identifies each unit to be colored to get the choropleth

Thats why the FIRST STEP after reading a geojson file as a dict, jdata, is to inspect its structure:

print(jdata.keys())

There can be three cases:

Each feature dict in the list jdata['features'] has a key 'id', that can be identified

from displaying:

jdata['features][0].keys()

In this case will be displayed the following keys:

dict_keys(['type', 'id', 'geometry'])
```

2. There is no key called `id` within the feature dicts  (neither outside nor inside an inner dict of each feature definition), like  `feature['properties'][id]` or `feature['anykey']['id']`). In this case if there is no key with another name that uniquely identifies each `feature['geometry']`, define yourself an id for each feature as follows:

```
 for k in range(len(jdata['features'])):
    jdata['features'][k]['id'] = k

```

3. When  displaying:
```
jdata['features][0].keys()
```

more keys than above are listed:

```
dict_keys(['type', 'geometry', 'properties', 'anykey'])
```

Inside `feature['properties']` or eventually `feature['anykey']` there is a key called either 'id' or it has another name, let us say `'someidentifier'`, that identifies uniquely a geographic region defined by `feature['geometry']`.

SECOND STEP: Based on the jdata definition one defines a dataframe, df, that has a column (with a name at your choice, but 'ids' is the most suggestive), consisting in all ids or a part of them for the cases 1 and 2, above, or in the case 3, all or a subset of the ids recorded as feature['properties']['id'], feature['properties']['someidentifier'] or feature['anykey']['someidentifier']. Note that the list(df['ids']) can be a permutation of the jdata ids or a part/subset of a permutation.

The second column, let us say df['vals'] is a numerical one, and it can contain the population in each geographical region, represented by feature['geometry'] or unemployment percent, etc.

With these data we can define a trace of type choroplethmapbox as follows:

Case 1 and 2:

trace= go.Choroplethmapbox(geojson=jdata,
                           locations=df['ids'],
                           z=df['vals'],
                           colorscale='Viridis',
                           colorbar_thickness=20,
                           hoverinfo='all',
                        )

Case 3:

trace= go.Choroplethmapbox(geojson=jdata,
                           locations=df['ids'],
                           z=df['vals'],
                           featureidkey='properties.id', #or 'anykey.someindentifier'                                                                                               
                           colorscale='deep_r',
                           colorbar_thickness=20,
                           hoverinfo='all',
                        )

The geojson files for a choropletmapbox can be found on the web, searching for the geojson file of states, regions, provinces or counties in some country. Often we can find topojson files for such administrative divisions of a country.

A topojson file can be converted online to a geojson file https://mygeodata.cloud/converter/topojson-to-geojson.

If no geojson or topojson file can be found for a country/region, then the solution is to read a shapefile and convert it to a geojson file. Details in the last section.

Examples:

In [1]:

import plotly
plotly.__version__

Out[1]:

'4.9.0'

Choropleth mapbox for a few China provinces¶

In [2]:

import numpy as np
import json
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode,  iplot
init_notebook_mode(connected=True)

Read a geojson file from an url, and check its structure:

In [3]:

china_url = 'https://raw.githubusercontent.com/chemzqm/geomap/master/china-province.geojson'

In [4]:

import urllib.request

def read_geojson(url):
    with urllib.request.urlopen(url) as url:
        jdata = json.loads(url.read().decode())
    return jdata 

In [5]:

jdata = read_geojson(china_url)

Inspect the geojson file content:

In [6]:

jdata['type']

Out[6]:

'FeatureCollection'

In [7]:

jdata['features'][0].keys()
                       

Out[7]:

dict_keys(['type', 'id', 'properties', 'geometry'])

In [8]:

jdata['features'][0]['properties']

Out[8]:

{'GEO_ID': 23, 'NAME': '黑龙江'}

In [9]:

#jdata['features'][9]['geometry']['coordinates']

For Choroplethmapbox attributes see: https://plot.ly/python/reference/#choroplethmapbox.

Let us select a list of ids as locations:

In [10]:

locations = [15+k for k in range(13)]
text = [feat['properties']['NAME']  for feat in jdata['features'] if feat['id'] in locations] #province names
text      

Out[10]:

['福建', '广西', '广东', '海南', '吉林', '辽宁', '天津', '青海', '甘肃', '陕西', '内蒙古', '重庆', '河北']

Define here some synthetic data for z:

In [11]:

z = [ 4.2,  8.1,  6.85, 11.3,  3.56, 10.3,  8.25,  12.57,  5.28,  14.9,  8.67, 10.3,  6.1]

In [12]:

mapboxt = open(".mapbox_token").read().rstrip() #my mapbox_access_token  must be used only for special mapbox style

For hovering we can set hoverinfo ='all' (to display on hover the location, z-value, and text) or any combination between the 'location', 'z', 'text'. (Attn!!!, although the attribute is locations, for hoverinfo one uses location (why???!!!!).

In [13]:

fig= go.Figure(go.Choroplethmapbox(z=z,
                            locations=locations,
                            colorscale='reds',
                            colorbar=dict(thickness=20, ticklen=3),
                            geojson=jdata,
                            text=text,
                            hoverinfo='all',
                            marker_line_width=1, marker_opacity=0.75))
                            
                            
fig.update_layout(title_text= 'Choroplethmapbox',
                  title_x=0.5, width = 700,# height=700,
                  mapbox = dict(center= dict(lat=36.913818,  lon=106.363625),
                                 accesstoken= mapboxt,
                                 style='basic',
                                 zoom=2.35,
                               ));

#fig.show()
                

Seeing only numbers and text on hover, as above, is not sufficiently informative. To display what each one represents, we define hovertemplate From go.Choroplethmapbox docs we learn that:

In [14]:

#help(go.Choroplethmapbox.hovertemplate)

In [15]:

fig.data[0].hovertemplate =  '<b>Province</b>: <b>%{text}</b>'+\
                              '<br> <b>Val </b>: %{z}<br>'
fig.update_layout(title_text= "Choroplethmapbox with hovertemplate");
iplot(fig)

Notice a fantastic feature of this chart type: although we don't give anywhere in the trace definition, the geographical position of province (polygons) centers, the hoverbox is authomatically placed at the visual center of a polygon/multipolygon.

Choropleth mapbox for Swiss cantons¶

In [ ]:

swiss_url = 'https://raw.githubusercontent.com/empet/Datasets/master/swiss-cantons.geojson'
jdata = read_geojson(swiss_url)

In [ ]:

jdata['features'][0].keys()

In [ ]:

jdata['features'][0]['properties']

In [ ]:

import pandas as pd

data_url = "https://raw.githubusercontent.com/empet/Datasets/master/Swiss-synthetic-data.csv"
df = pd.read_csv(data_url)
df.head()

In [ ]:

mycustomdata = np.stack((df['canton-name'], df['2018']), axis=-1)
title = 'Swiss Canton Choroplethmapbox'

fig = go.Figure(go.Choroplethmapbox(geojson=jdata, 
                                    locations=df['canton-id'], 
                                    z=df['2018'],
                                    featureidkey='properties.id',
                                    coloraxis="coloraxis",
                                    customdata=mycustomdata,
                                    hovertemplate= 'Canton: %{customdata[0]}'+\
                                                   '<br>2018: %{customdata[1]}%<extra></extra>',
                                    marker_line_width=1))


fig.update_layout(title_text = title,
                  title_x=0.5,
                  coloraxis_colorscale='algae_r',
                  mapbox=dict(style='carto-positron',
                              zoom=6.5, 
                              center = {"lat": 46.8181877 , "lon":8.2275124 },
                              )); 
                            
#fig.show()
                

Plotly express version of the same choroplethmapbox:

In [ ]:

import plotly.express as px

fig = px.choropleth_mapbox(df, geojson=jdata, 
          featureidkey='properties.id',
          locations='canton-id',
          color='2018',
        color_continuous_scale  ='algae_r',      
                          
          zoom=6.5,
          center={'lat': 46.8181877 , 'lon':8.2275124 },
          mapbox_style='carto-positron')

fig.update_layout(title_text='', #title,
                  title_x=0.5,
                  coloraxis_reversescale=True,
                  #coloraxis_colorscale=algae  #'Viridis',
                  );
fig.show()

Choroplethmapbox from a geojson dict derived from a shapefile¶

To get the shapefile for the counties/regions of a country we access the Global Administrative Areas Database (GADM) https://gadm.org/, select Data, and then click the link country and choose from a dropdown menu the country of interest https://gadm.org/download_country_v3.html.

Each zip file downloaded from GADM contains multiple shapefiles, indexed by the level of detail, with 0, 1, 2, 3, eventually 4. Level 0 contains the shapefile of a country (UK, for example). Level 1 corresponds to provinces (regions) (in UK there exist four provinces: England, Scotland, Wales, and Northern Ireland. Level 2 shapefiles represent counties, and level 3, 4, smaller administrative subdivisions of each county.

There exist at least 4 files with the same level index. Their extension is shp, shx, dbf, prj. For more information on these files see https://en.wikipedia.org/wiki/Shapefile.

A shape file is read as a geopandas dataframe, by geopandas.read_file('filename.shp'), https://github.com/geopandas/geopandas/blob/fbe743f3131cc5942fef8362ef2aed606dc45e23/doc/source/io.rst

Then it is converted to a geojson file to be used for a choroplethmapbox definition.

In [ ]:

import geopandas as gpd
gpd.__version__

We downloaded a zip file containing Norway administrative regions. Read the level 1, shape file:

In [ ]:

level = 1
gdf = gpd.read_file(f"gadm36_NOR_shp/gadm36_NOR_{level}.shp", encoding='utf-8')
#gdf.head()

To be sure that you set right data for your go.Choroplethmapbox and it will be displayed you must check the CRS for gdf. The geometric shapes in your GeoDataFrame, gdf, are represented by coordinates in an arbitrary space. A CRS (Coordinate Reference System) tells Python how those coordinates relate to places on the Earth.

In [ ]:

gdf.crs

Hence our gdf contains data (coordinates) in the WGS84 (EPSG:4326) standard. This is the best case when we intend to convert the geodataframe to a geojson file for mapbox. Mapbox maps are visually rendered in the Web Mercator Projection (EPSG:3857), but by https://docs.mapbox.com/api/#coordinate-format when we provide geographic coordinates to a Mapbox API (in our case to define a go.Scattermapbox or go.Choroplethmapbox), they should be formatted in the order longitude, latitude, and specified as decimal degrees in the WGS84 coordinate system. If gdf.crs is WGS84 then the following conversion defines a right geojson file to be used for a Chroplethmapbox:

In [ ]:

gdf.to_file('norway-geo.json', driver = 'GeoJSON')
with open('norway-geo.json') as geofile:
    jdataNo = json.load(geofile)    

If gdf.crs does not displays the WGS84 coordinate system, then before the conversion, gdf to geojson, a crs conversion must be performed, as follows: gdf.to_crs(epsg=4326)

Now let us check the Norway's geojson file:

In [ ]:

jdataNo['features'][0].keys()

In [ ]:

jdataNo['features'][0]['properties']

Since it it is difficult to decide which key identifies uniquely each region we are defining a default id as follows:

In [ ]:

for k in range(len(jdataNo['features'])):
    jdataNo['features'][k]['id'] = k

Based on this 'id' definition we set up a pandas dataframe that contains data to be associated to each Norway region:

In [ ]:

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/empet/Datasets/master/Norway-vals.csv')
df.head()

In [ ]:

fig = go.Figure(go.Choroplethmapbox(z=df['vals'],
                            locations = df['geo-id'], 
                            colorscale = 'ice',
                            colorbar = dict(thickness=20, ticklen=3),
                            geojson = jdataNo,
                            text = df['geo-name'],
                            hovertemplate = '<b>State</b>: <b>%{text}</b>'+
                                            '<br> <b>Val </b>: %{z}<br>',
                            marker_line_width=0.1, marker_opacity=0.7))

fig.update_layout(title_text ='Norway mapbox choropleth', title_x =0.5, width=750, height=700,
                 mapbox = dict(center= dict(lat=64.5, lon=18.75),            
                               accesstoken= mapboxt,
                               zoom=3
                               ))

iplot(fig)

In [ ]:

Tips to extract data from a geojson dict to define a choroplethmapbox chart¶

Choropleth mapbox for a few China provinces¶

Choropleth mapbox for Swiss cantons¶

Choroplethmapbox from a geojson dict derived from a shapefile¶

Tips to extract data from a geojson dict to define a `choroplethmapbox` chart¶