This notebook illustrates the blogpost (in French) https://www.smalsresearch.be/la-jointure-spatiale-la-cle-de-lanalytique-geographique .

In this blogpost, we explain the concept of spatial join, and use it to highlight incoherences between several open dataset about Belgian geography

In [1]:
import os

import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
import matplotlib.pyplot as plt
import geopy
import urllib
from shapely.geometry import shape
import contextily as ctx
from tqdm.auto import tqdm

with_plotly = True # Preferably to True, but in case plotly cannot be installed, this allows the notebook to work
if with_plotly:
    import plotly.express as px
    import plotly.graph_objects as go

Installations on a fresh Conda environment:

- conda install geopandas
- conda install matplotlib
- conda install -c conda-forge geopy
- conda install -c conda-forge contextily
- conda install plotly (if with_plotly==True)
- conda install xlrd
- conda install tqdm
- conda install -c conda-forge ipywidgets
In [2]:
crs =     'epsg:3857'
osm_crs=  'epsg:4326'

Functions

In [3]:
def download_if_nexist(url, filename):
    """
    If the (local) file <filename> does not exists, download it from <url> 

    Parameters
    ----------
    url: str
       url to fetch
    filename: str
       local file to save

    Returns
    -------

    None
    """
    if not os.path.isfile(filename):
        with urllib.request.urlopen(url) as response:
            with open(filename, "wb") as f:
                f.write(response.read())
In [4]:
def get_osm_polyg(name):
    """
    Request Nominatim (OpenStreetMap) for "name", and returns the polygon (or point) given in the (first) result
    
    Parameters
    ----------
    name: str
        
    """
    url = f"http://nominatim.openstreetmap.org/search.php?q={name}&format=json&limit=1&polygon_geojson=1"
    with urllib.request.urlopen(url) as response:
        json_res = json.loads(response.read())
        if len(json_res)==0:
            return None
        return shape(json_res[0]["geojson"])

Load data

Zip code list

Get the official list of Belgian zipcodes, provided by BPost. Some of them correspond to real (sub)-localities, other are "special codes", corresponding to some institutions (Parliaments, ...) or companies (RTBF, 3 Suisses, ...)

This source does not contain any geographical informations

If several entities share the same zipcode, this zipcode appears in several records

In [52]:
download_if_nexist("http://www.bpost2.be/zipcodes/files/zipcodes_alpha_fr_new.xls",
                  "data/zipcodes_alpha_fr_new.xls")

zipcodes = pd.read_excel("data/zipcodes_alpha_fr_new.xls")
zipcodes = zipcodes.rename({"Code postal":"zipcode", "Commune principale": "main_commune", "Localité":"locality"}, axis=1)
zipcodes["zipcode"] = zipcodes["zipcode"].astype(str)
zipcodes["is_special"] = zipcodes["Sous-commune"].isnull()
print(zipcodes.zipcode.nunique(), "unique zipcodes")
zipcodes
1190 unique zipcodes
Out[52]:
zipcode locality Sous-commune main_commune Province is_special
0 2970 'S Gravenwezel Oui SCHILDE ANVERS False
1 3700 'S Herenelderen Oui TONGEREN LIMBOURG False
2 7510 3 Suisses NaN 3 Suisses NaN True
3 9420 Aaigem Oui ERPE-MERE FLANDRE-ORIENTALE False
4 8511 Aalbeke Oui KORTRIJK FLANDRE-OCCIDENTALE False
... ... ... ... ... ... ...
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG False
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE False
2762 8750 Zwevezele Oui WINGENE FLANDRE-OCCIDENTALE False
2763 9052 Zwijnaarde Oui GENT FLANDRE-ORIENTALE False
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS False

2765 rows × 6 columns

In [6]:
zipcodes_normal = zipcodes[~zipcodes.is_special].drop("is_special", axis=1)

Some stats

In [7]:
print("Number of zipcodes:", zipcodes.zipcode.nunique())
print("Number of (not special) zipcodes :", zipcodes_normal.zipcode.nunique())
print("Number of (main) communes:", (zipcodes_normal.main_commune+"_"+zipcodes_normal.Province).nunique())
Number of zipcodes: 1190
Number of (not special) zipcodes : 1146
Number of (main) communes: 581
In [8]:
commune_stats =zipcodes_normal.groupby("main_commune").agg({"zipcode":"nunique", 
                                                            "locality":"count", 
                                                            "Province": max})
commune_stats
Out[8]:
zipcode locality Province
main_commune
AALST 4 9 FLANDRE-ORIENTALE
AALTER 3 6 FLANDRE-ORIENTALE
AARSCHOT 3 4 BRABANT FLAMAND
AARTSELAAR 1 1 ANVERS
AFFLIGEM 1 4 BRABANT FLAMAND
... ... ... ...
ZULTE 1 3 FLANDRE-ORIENTALE
ZUTENDAAL 1 1 LIMBOURG
ZWALM 2 12 FLANDRE-ORIENTALE
ZWEVEGEM 5 5 FLANDRE-OCCIDENTALE
ZWIJNDRECHT 1 2 ANVERS

581 rows × 3 columns

In [9]:
division_counts = pd.DataFrame(columns = ["#zipcodes", "#localities", "desc", "query"],
             data=[
 ["=1", "=1",   "Only one locality within the commune",         "(zipcode==1) & (locality == 1)"],
 ["=1", ">1",   "Zipcode share by all the localities",          "(zipcode==1) & (locality>1)"],
 [">1", "=zip", "A distinct zipcode for each locality",         "(zipcode >1) & (zipcode==locality)"],
 [">1", ">zip", "Some localities share zipcodes",               "(zipcode >1) & (zipcode<locality)"],
# [">1", "<zip", "(more zip than localities, does not happen)",  "(zipcode >1) & (zipcode>locality)"] 
])
In [10]:
division_counts["BEL"] = division_counts["query"].apply(lambda q: commune_stats.query(q).locality.count())
division_counts.drop("query", axis=1)
Out[10]:
#zipcodes #localities desc BEL
0 =1 =1 Only one locality within the commune 98
1 =1 >1 Zipcode share by all the localities 261
2 >1 =zip A distinct zipcode for each locality 53
3 >1 >zip Some localities share zipcodes 169
In [11]:
# Stats per province
prov_counts = division_counts["query"].apply(lambda q: commune_stats.query(q).groupby("Province").locality.count()).fillna(0)
division_counts_prov = pd.concat([division_counts.drop("query", axis=1), prov_counts], axis=1)
division_counts_prov
Out[11]:
#zipcodes #localities desc BEL ANVERS BRABANT FLAMAND BRABANT WALLON BRUXELLES FLANDRE-OCCIDENTALE FLANDRE-ORIENTALE HAINAUT LIEGE LIMBOURG LUXEMBOURG NAMUR
0 =1 =1 Only one locality within the commune 98 33.0 11.0 3.0 18.0 11.0 7.0 2.0 4.0 8.0 1.0 0.0
1 =1 >1 Zipcode share by all the localities 261 24.0 21.0 17.0 0.0 37.0 30.0 27.0 42.0 18.0 22.0 23.0
2 >1 =zip A distinct zipcode for each locality 53 6.0 11.0 3.0 1.0 4.0 5.0 5.0 11.0 5.0 2.0 0.0
3 >1 >zip Some localities share zipcodes 169 6.0 22.0 4.0 0.0 12.0 18.0 35.0 27.0 11.0 19.0 15.0
In [12]:
division_counts_prov.set_index("desc").drop(["BEL", "#zipcodes", "#localities"], axis=1).T.plot.bar(figsize=(15,6))
Out[12]:
<AxesSubplot:>
In [13]:
print("Communes with a locality having the same name")
display(zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)])
print("Number of (unique) communes:", zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)].main_commune.nunique())
Communes with a locality having the same name
zipcode locality Sous-commune main_commune Province
6 9300 Aalst Non AALST FLANDRE-ORIENTALE
7 9880 Aalter Non AALTER FLANDRE-ORIENTALE
8 3200 Aarschot Non AARSCHOT BRABANT FLAMAND
11 2630 Aartselaar Non AARTSELAAR ANVERS
22 1790 Affligem Non AFFLIGEM BRABANT FLAMAND
... ... ... ... ... ...
2757 8377 Zuienkerke Non ZUIENKERKE FLANDRE-OCCIDENTALE
2758 9870 Zulte Non ZULTE FLANDRE-ORIENTALE
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS

528 rows × 5 columns

Number of (unique) communes: 521
In [14]:
zipcodes_normal[zipcodes_normal["Sous-commune"] == "Non"]
Out[14]:
zipcode locality Sous-commune main_commune Province
6 9300 Aalst Non AALST FLANDRE-ORIENTALE
7 9880 Aalter Non AALTER FLANDRE-ORIENTALE
8 3200 Aarschot Non AARSCHOT BRABANT FLAMAND
11 2630 Aartselaar Non AARTSELAAR ANVERS
22 1790 Affligem Non AFFLIGEM BRABANT FLAMAND
... ... ... ... ... ...
2757 8377 Zuienkerke Non ZUIENKERKE FLANDRE-OCCIDENTALE
2758 9870 Zulte Non ZULTE FLANDRE-ORIENTALE
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS

528 rows × 5 columns

In [15]:
# Several localities with the same name within the same commune
zipcodes[zipcodes[["locality", "main_commune"]].duplicated(keep=False)].sort_values("locality")
Out[15]:
zipcode locality Sous-commune main_commune Province is_special
60 2000 Antwerpen Non ANTWERPEN ANVERS False
61 2018 Antwerpen Non ANTWERPEN ANVERS False
62 2020 Antwerpen Non ANTWERPEN ANVERS False
63 2030 Antwerpen Non ANTWERPEN ANVERS False
64 2040 Antwerpen Non ANTWERPEN ANVERS False
65 2050 Antwerpen Non ANTWERPEN ANVERS False
66 2060 Antwerpen Non ANTWERPEN ANVERS False
1181 9120 Kallo Oui BEVEREN-WAAS FLANDRE-ORIENTALE False
1182 9130 Kallo Oui BEVEREN-WAAS FLANDRE-ORIENTALE False
1359 4000 Liège Non LIÈGE LIEGE False
1360 4020 Liège Non LIÈGE LIEGE False

Zip code centers

Get a list giving for each zipcode, its locality name, and geographical center.

Note : it does not contain any "special zicode"

In [16]:
# https://data.gov.be/fr/dataset/328ba4f140ba0e870dfc9c70635fe7c1840980b1
download_if_nexist("https://www.odwb.be/api/v2/catalog/datasets/code-postaux-belge/exports/geojson",
                  "data/zipcode_centers.geojson")

zipcodes_centers = gpd.read_file("data/zipcode_centers.geojson")
zipcodes_centers = zipcodes_centers.rename({"column_1": "zipcode", "column_2": "locality"}, axis=1)[["locality","zipcode", "geometry"]]

zipcodes_centers = zipcodes_centers.to_crs(crs)
print(zipcodes_centers.zipcode.nunique(), "unique zipcode")
zipcodes_centers
1145 unique zipcode
Out[16]:
locality zipcode geometry
0 Saint-Gilles 1060 POINT (483757.549 6590703.228)
1 Molenbeek-Saint-Jean 1080 POINT (481209.435 6595571.150)
2 Ganshoren 1083 POINT (480623.048 6598549.025)
3 Neder-Over-Heembeek 1120 POINT (488746.985 6603234.900)
4 Wavre 1300 POINT (512931.716 6571283.981)
... ... ... ...
2752 Dikkelvenne 9890 POINT (410707.640 6606742.923)
2753 Lovendegem 9920 POINT (401597.631 6638236.014)
2754 Zomergem 9930 POINT (396769.791 6642472.970)
2755 Waarschoot 9950 POINT (401437.142 6648141.593)
2756 Assenede 9960 POINT (418205.916 6662086.731)

2757 rows × 3 columns

In [17]:
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_centers.zipcode)]
Out[17]:
zipcode locality Sous-commune main_commune Province
1895 5352 Perwez-Haillot Oui OHEY NAMUR

Zip code boundaries

Get the boundaries for all zipcodes. Provided by BPost.

In the original files, multi-polygons as given as multiple (single)-polygons records. We merge them in ony one record.

Note that in the "center" file, if several entities share the same zip, the center of each entity is given, while here, we only have one (multi-)polygon per zipcode.

In [18]:
zipcode_boundaries_filename = "data/zipcode_boundaries_shapefile_3812.zip"
In [19]:
# https://www.geo.be/catalog/details/9738c7c0-5255-11ea-8895-34e12d0f0423?l=fr
download_if_nexist("https://bgu.bpost.be/assets/9738c7c0-5255-11ea-8895-34e12d0f0423_x-shapefile_3812.zip",
                  zipcode_boundaries_filename)
In [20]:
zipcodes_boundaries = gpd.read_file(f"zip://{zipcode_boundaries_filename}/3812")
zipcodes_boundaries["is_special"] = zipcodes_boundaries.CP_speciau ==1
zipcodes_boundaries = zipcodes_boundaries.rename({"nouveau_PO":"zipcode"}, axis=1)[["zipcode", "is_special", "geometry"]]

zipcodes_boundaries = zipcodes_boundaries.dissolve(["zipcode", "is_special"]).reset_index()

zipcodes_boundaries = zipcodes_boundaries.to_crs(crs)
zipcodes_boundaries
Out[20]:
zipcode is_special geometry
0 1000 False MULTIPOLYGON Z (((487951.058 6600061.994 0.000...
1 1005 True POLYGON Z ((484391.808 6593846.983 0.000, 4844...
2 1006 True POLYGON Z ((484413.037 6593943.815 0.000, 4843...
3 1007 True POLYGON Z ((486232.320 6594255.851 0.000, 4862...
4 1008 True POLYGON Z ((486000.332 6594258.665 0.000, 4858...
... ... ... ...
1182 9982 False POLYGON Z ((398126.149 6670079.581 0.000, 3981...
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021...
1184 9990 False POLYGON Z ((383123.261 6664423.574 0.000, 3831...
1185 9991 False POLYGON Z ((389466.378 6660430.021 0.000, 3895...
1186 9992 False POLYGON Z ((376698.000 6669876.805 0.000, 3767...

1187 rows × 3 columns

In [21]:
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_boundaries.zipcode)]
Out[21]:
zipcode locality Sous-commune main_commune Province
1175 6642 Juseret Oui VAUX-SUR-SÛRE LUXEMBOURG
In [22]:
# Extra zipcode
zipcodes_boundaries[~zipcodes_boundaries.zipcode.isin(zipcodes.zipcode)]
Out[22]:
zipcode is_special geometry
1071 9 False POLYGON Z ((551743.087 6617609.407 0.000, 5517...

NIS code boundaries

Get boundaries for all statistical sectors, from StatBel. Statistical sectors can be grouped by NIS code, providing NIS code boundaries.

In [23]:
# https://statbel.fgov.be/fr/open-data/secteurs-statistiques-2020

download_if_nexist("https://statbel.fgov.be/sites/default/files/files/opendata/Statistische%20sectoren/sh_statbel_statistical_sectors_31370_20200101.shp.zip",
                   "data/stat_sectors_2020.zip")
statistical_sectors = gpd.read_file("zip://data/stat_sectors_2020.zip/sh_statbel_statistical_sectors_20200101.shp")
statistical_sectors["CNIS5_2020"] = statistical_sectors["CNIS5_2020"].astype(str)
In [24]:
# Group (with "dissolve") sectors per NIS code
nis_boundaries = statistical_sectors[["CNIS5_2020", "T_MUN_FR", "T_MUN_NL", "geometry"]].dissolve(by="CNIS5_2020").reset_index()
nis_boundaries = nis_boundaries.rename({"CNIS5_2020": "niscode"}, axis=1)
nis_boundaries = nis_boundaries.to_crs(crs)

Spatial join : polygon vs point

As a first example of spatial join, we will join polygons (from zipcodes_boundaries) and points (from zipcodes_centers).

In the resulting table sjoin_zip_center, we'll get all combination of zipcodes_boundaries and zipcodes_centers, where the point of zipcodes_centers falls into the polygon of zipcodes_boundaries.

sjoin_mismatches will contain all combinations where the zipcode provided by zipcodes_centers does not match the zipcode provided by zipcodes_boundaries

In [25]:
sjoin_zip_center = gpd.sjoin(zipcodes_boundaries, zipcodes_centers)
sjoin_zip_center
Out[25]:
zipcode_left is_special geometry index_right locality zipcode_right
0 1000 False MULTIPOLYGON Z (((487951.058 6600061.994 0.000... 1682 Bruxelles 1000
8 1020 False POLYGON Z ((484519.836 6603848.988 0.000, 4845... 2218 Laeken 1020
9 1030 False POLYGON Z ((488633.964 6600056.547 0.000, 4887... 2219 Schaerbeek 1030
13 1040 False MULTIPOLYGON Z (((486480.015 6594068.947 0.000... 1139 Etterbeek 1040
21 1050 False MULTIPOLYGON Z (((486898.793 6593112.249 0.000... 591 Ixelles 1050
... ... ... ... ... ... ...
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021... 1138 Watervliet 9988
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021... 1680 Waterland-Oudeman 9988
1184 9990 False POLYGON Z ((383123.261 6664423.574 0.000, 3831... 1681 Maldegem 9990
1185 9991 False POLYGON Z ((389466.378 6660430.021 0.000, 3895... 2216 Adegem 9991
1186 9992 False POLYGON Z ((376698.000 6669876.805 0.000, 3767... 2217 Middelburg 9992

2757 rows × 6 columns

In [26]:
sjoin_mismatches = sjoin_zip_center[sjoin_zip_center.zipcode_left != sjoin_zip_center.zipcode_right].reset_index(drop=True)
sjoin_mismatches
Out[26]:
zipcode_left is_special geometry index_right locality zipcode_right
0 1130 False POLYGON Z ((492165.130 6604187.117 0.000, 4921... 2343 Haren 3700
1 1541 False POLYGON Z ((442698.747 6572867.831 0.000, 4427... 499 Driekapellen 8600
2 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 45 Antwerpen 2050
3 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1184 Antwerpen 2060
4 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1181 Antwerpen 2030
5 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1180 Antwerpen 2018
6 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1740 Antwerpen 2020
7 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 2277 Antwerpen 2040
8 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 2324 Laar 3400
9 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 1187 Schriek 2223
10 2490 False POLYGON Z ((579405.019 6646044.757 0.000, 5793... 1177 Berg 1910
11 2490 False POLYGON Z ((579405.019 6646044.757 0.000, 5793... 2342 Berg 3700
12 2560 False POLYGON Z ((522581.783 6652932.359 0.000, 5226... 1275 Voort 3840
13 3202 False POLYGON Z ((545421.333 6620399.304 0.000, 5454... 1802 Donk 3540
14 3300 False POLYGON Z ((546638.037 6597317.597 0.000, 5466... 697 Sint-Margriete-Houtem 3470
15 3700 False POLYGON Z ((613768.565 6589710.582 0.000, 6137... 135 Kolmont 3840
16 3770 False POLYGON Z ((625035.587 6592809.132 0.000, 6250... 2736 Elst 9660
17 4000 False MULTIPOLYGON Z (((622294.900 6548667.801 0.000... 742 Liège 4020
18 4120 False POLYGON Z ((604356.408 6544578.503 0.000, 6043... 772 Ehein 4480
19 4122 False POLYGON Z ((617503.511 6544062.555 0.000, 6175... 153 Neupré 4120
20 4141 False POLYGON Z ((640688.017 6541527.140 0.000, 6406... 1354 Louveigné 4920
21 4400 False POLYGON Z ((609617.951 6554549.539 0.000, 6096... 1489 Noirfontaine 6831
22 4852 False POLYGON Z ((664231.609 6575203.101 0.000, 6642... 230 Plombières 4850
23 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 1947 Mont 5530
24 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 1463 Mont 6661
25 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 1348 Polleur 4800
26 4960 False POLYGON Z ((680570.480 6512355.732 0.000, 6805... 931 Bellevaux 6834
27 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 307 Neuville 5600
28 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 2445 La Bruyère 5080
29 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 2421 Andrimont 4821
30 5541 False POLYGON Z ((538168.187 6483865.512 0.000, 5382... 287 Hastière 5540
31 5576 False POLYGON Z ((558829.518 6455080.381 0.000, 5589... 854 Honnay 5570
32 6640 False POLYGON Z ((627657.956 6447744.344 0.000, 6276... 905 Juseret 6642
33 6666 False POLYGON Z ((636191.395 6480967.171 0.000, 6362... 1510 Mormont 6997
34 6681 False POLYGON Z ((616954.726 6458024.935 0.000, 6169... 2021 Sainte-Ode 6680
35 6724 False POLYGON Z ((620859.901 6407429.477 0.000, 6213... 1473 Habay 6720
36 6820 False MULTIPOLYGON Z (((593091.087 6396179.853 0.000... 226 Lambermont 4800
37 7141 False POLYGON Z ((476514.652 6524706.111 0.000, 4764... 967 Morlanwelz 7140
38 7624 False POLYGON Z ((374657.316 6537126.140 0.000, 3747... 1552 Brunehaut 7620
39 7811 False POLYGON Z ((426644.077 6556386.940 0.000, 4269... 2084 Maffle 7810
40 8301 False POLYGON Z ((364600.110 6682813.553 0.000, 3646... 1582 Knokke-Heist 8300
41 8620 False POLYGON Z ((314007.990 6639410.105 0.000, 3139... 2347 Sluizen 3700
42 8755 False POLYGON Z ((370845.213 6638802.585 0.000, 3708... 2722 Bambrugge 9420
43 8952 False POLYGON Z ((317308.967 6581658.756 0.000, 3172... 1619 Heuvelland 8950
44 9120 False POLYGON Z ((475777.507 6667578.396 0.000, 4757... 1083 Kallo 9130
45 9130 False POLYGON Z ((478761.897 6667836.361 0.000, 4779... 2707 Kallo 9120
46 9180 False POLYGON Z ((442800.468 6661195.878 0.000, 4428... 1648 Moerbeke 9500
47 9180 False POLYGON Z ((442800.468 6661195.878 0.000, 4428... 1628 Sinaai-Waas 9112
48 9220 False POLYGON Z ((464845.765 6639208.960 0.000, 4647... 630 Hamme 1785
49 9620 False POLYGON Z ((426237.365 6605895.998 0.000, 4262... 1114 Sint-Maria-Oudenhove 9660
50 9620 False POLYGON Z ((426237.365 6605895.998 0.000, 4262... 2180 Oombergen 9520
51 9790 False POLYGON Z ((395001.891 6600120.574 0.000, 3950... 1120 Ooike 9700

We can now explore all those mismatch. As a way to make decisions easier, we provide several data:

  • The mismatch record (with two different zipcodes zip1, zip2)
  • Information we have from BPost for the two zipcodes
  • What OpenStreetMap knows at the point given by zipcodes_centers
  • What OpenStreetMap knows for the given locality (in zipcodes_centers)
  • A map with :
    • the boundaries corresponding to zip1 and zip2 in zipcodes_boundaries et
    • all points corresponding to zip1 and zip2 in zipcodes_centers
In [27]:
def plot_mismatch_zip(sjoin_mismatches, i):
    print(f"Mismatch record {i}:")
    display(sjoin_mismatches.iloc[[i]])

    zip1 = sjoin_mismatches.iloc[i].zipcode_left
    zip2 = sjoin_mismatches.iloc[i].zipcode_right

    print("From BPost:")
    display( zipcodes[ zipcodes.zipcode.isin([zip1, zip2])])

    try: 
        print("Know by OSM at the given point:")
        pt = zipcodes_centers.to_crs(osm_crs).loc[sjoin_mismatches.iloc[i].index_right].geometry
        geolocator = geopy.geocoders.Nominatim(user_agent="smalsresearch")
        print(geolocator.reverse(f"{pt.y}, {pt.x}").address)

        print("Know by OSM for the given locality:")
        loc = zipcodes_centers.loc[sjoin_mismatches.iloc[i].index_right].locality
        print(geolocator.geocode(f"{loc}, Belgium").address)
    except Exception as e:
        print("Cannot get OSM data:", e)

    query = f"zipcode in ('{zip1}', '{zip2}')"


    df_bnd = zipcodes_boundaries.assign(label=zipcodes_boundaries.zipcode).query(query).to_crs(osm_crs)
    df_pnt = zipcodes_centers.assign(label=zipcodes_centers.zipcode+" "+zipcodes_centers.locality).query(query).to_crs(osm_crs)

    df_pnt["mismatch"] = False
    df_pnt.loc[sjoin_mismatches.iloc[i].index_right, "mismatch"] = True

    if with_plotly:
        fig = px.choropleth_mapbox(df_bnd, 
                                   geojson=df_bnd.geometry, 
                                   locations=df_bnd.index, 
                                   hover_name="label",
                                   color=df_bnd['label'],
                                   center={"lat": df_bnd.bounds.mean()[["miny", "maxy"]].mean(), 
                                           "lon": df_bnd.bounds.mean()[["minx", "maxx"]].mean()},
                                   mapbox_style="open-street-map",
                                   #zoom=10,
                                   opacity=0.5,

                           )

        for i, bloc in df_pnt.groupby("mismatch"): 
            fig.add_trace(go.Scattermapbox(lat=bloc.geometry.y,
                                           lon=bloc.geometry.x,
                                           mode='markers+text', # "markers+text",
                                           text=bloc.label, 
                                           name="Zipcode centers" + (" (mismatch)" if i else ""),
                                           textposition="bottom center"
                                          )
                                    )
        fig.show()
    else: 
        df=pd.concat([df_bnd, df_pnt]).to_crs(crs)
        df["label"] = df["label"] + np.where(df.mismatch==True, "*", "")
        ax = df.plot("label", figsize=(10,15), legend=True)
        try:
            ctx.add_basemap(ax)
        except Exception as  e:
            print("Cannot add basemap: ", e)
        plt.show()
        display(df[["zipcode", "geometry","label"]])
In [28]:
# for i in range(sjoin_mismatches.shape[0]):
for i in [8, 19, 25, 30, 43]:
    plot_mismatch_zip(sjoin_mismatches, i)
    print("--------------")
Mismatch record 8:
zipcode_left is_special geometry index_right locality zipcode_right
8 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 2324 Laar 3400
From BPost:
zipcode locality Sous-commune main_commune Province is_special
603 2180 Ekeren Oui ANTWERPEN ANVERS False
609 3400 Eliksem Oui LANDEN BRABANT FLAMAND False
683 3400 Ezemaal Oui LANDEN BRABANT FLAMAND False
1266 3400 Laar Oui LANDEN BRABANT FLAMAND False
1285 3400 Landen Non LANDEN BRABANT FLAMAND False
1692 3400 Neerwinden Oui LANDEN BRABANT FLAMAND False
1867 3400 Overwinden Oui LANDEN BRABANT FLAMAND False
2079 3400 Rumsdorp Oui LANDEN BRABANT FLAMAND False
2581 3400 Wange Oui LANDEN BRABANT FLAMAND False
Know by OSM at the given point:
11, Laar, Donk, Ekeren, Antwerpen, Vlaanderen, 2180, België / Belgique / Belgien
Know by OSM for the given locality:
Laar, Wommelgem, Antwerpen, Vlaanderen, 2160, België / Belgique / Belgien