This notebook illustrates the blogpost (in French) https://www.smalsresearch.be/la-jointure-spatiale-la-cle-de-lanalytique-geographique .
In this blogpost, we explain the concept of spatial join, and use it to highlight incoherences between several open dataset about Belgian geography
import os
import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
import matplotlib.pyplot as plt
import geopy
import urllib
from shapely.geometry import shape
import contextily as ctx
from tqdm.auto import tqdm
with_plotly = True # Preferably to True, but in case plotly cannot be installed, this allows the notebook to work
if with_plotly:
import plotly.express as px
import plotly.graph_objects as go
Installations on a fresh Conda environment:
- conda install geopandas
- conda install matplotlib
- conda install -c conda-forge geopy
- conda install -c conda-forge contextily
- conda install plotly (if with_plotly==True)
- conda install xlrd
- conda install tqdm
- conda install -c conda-forge ipywidgets
crs = 'epsg:3857'
osm_crs= 'epsg:4326'
def download_if_nexist(url, filename):
"""
If the (local) file <filename> does not exists, download it from <url>
Parameters
----------
url: str
url to fetch
filename: str
local file to save
Returns
-------
None
"""
if not os.path.isfile(filename):
with urllib.request.urlopen(url) as response:
with open(filename, "wb") as f:
f.write(response.read())
def get_osm_polyg(name):
"""
Request Nominatim (OpenStreetMap) for "name", and returns the polygon (or point) given in the (first) result
Parameters
----------
name: str
"""
url = f"http://nominatim.openstreetmap.org/search.php?q={name}&format=json&limit=1&polygon_geojson=1"
with urllib.request.urlopen(url) as response:
json_res = json.loads(response.read())
if len(json_res)==0:
return None
return shape(json_res[0]["geojson"])
Get the official list of Belgian zipcodes, provided by BPost. Some of them correspond to real (sub)-localities, other are "special codes", corresponding to some institutions (Parliaments, ...) or companies (RTBF, 3 Suisses, ...)
This source does not contain any geographical informations
If several entities share the same zipcode, this zipcode appears in several records
download_if_nexist("http://www.bpost2.be/zipcodes/files/zipcodes_alpha_fr_new.xls",
"data/zipcodes_alpha_fr_new.xls")
zipcodes = pd.read_excel("data/zipcodes_alpha_fr_new.xls")
zipcodes = zipcodes.rename({"Code postal":"zipcode", "Commune principale": "main_commune", "Localité":"locality"}, axis=1)
zipcodes["zipcode"] = zipcodes["zipcode"].astype(str)
zipcodes["is_special"] = zipcodes["Sous-commune"].isnull()
print(zipcodes.zipcode.nunique(), "unique zipcodes")
zipcodes
1190 unique zipcodes
zipcode | locality | Sous-commune | main_commune | Province | is_special | |
---|---|---|---|---|---|---|
0 | 2970 | 'S Gravenwezel | Oui | SCHILDE | ANVERS | False |
1 | 3700 | 'S Herenelderen | Oui | TONGEREN | LIMBOURG | False |
2 | 7510 | 3 Suisses | NaN | 3 Suisses | NaN | True |
3 | 9420 | Aaigem | Oui | ERPE-MERE | FLANDRE-ORIENTALE | False |
4 | 8511 | Aalbeke | Oui | KORTRIJK | FLANDRE-OCCIDENTALE | False |
... | ... | ... | ... | ... | ... | ... |
2760 | 3690 | Zutendaal | Non | ZUTENDAAL | LIMBOURG | False |
2761 | 8550 | Zwevegem | Non | ZWEVEGEM | FLANDRE-OCCIDENTALE | False |
2762 | 8750 | Zwevezele | Oui | WINGENE | FLANDRE-OCCIDENTALE | False |
2763 | 9052 | Zwijnaarde | Oui | GENT | FLANDRE-ORIENTALE | False |
2764 | 2070 | Zwijndrecht | Non | ZWIJNDRECHT | ANVERS | False |
2765 rows × 6 columns
zipcodes_normal = zipcodes[~zipcodes.is_special].drop("is_special", axis=1)
print("Number of zipcodes:", zipcodes.zipcode.nunique())
print("Number of (not special) zipcodes :", zipcodes_normal.zipcode.nunique())
print("Number of (main) communes:", (zipcodes_normal.main_commune+"_"+zipcodes_normal.Province).nunique())
Number of zipcodes: 1190 Number of (not special) zipcodes : 1146 Number of (main) communes: 581
commune_stats =zipcodes_normal.groupby("main_commune").agg({"zipcode":"nunique",
"locality":"count",
"Province": max})
commune_stats
zipcode | locality | Province | |
---|---|---|---|
main_commune | |||
AALST | 4 | 9 | FLANDRE-ORIENTALE |
AALTER | 3 | 6 | FLANDRE-ORIENTALE |
AARSCHOT | 3 | 4 | BRABANT FLAMAND |
AARTSELAAR | 1 | 1 | ANVERS |
AFFLIGEM | 1 | 4 | BRABANT FLAMAND |
... | ... | ... | ... |
ZULTE | 1 | 3 | FLANDRE-ORIENTALE |
ZUTENDAAL | 1 | 1 | LIMBOURG |
ZWALM | 2 | 12 | FLANDRE-ORIENTALE |
ZWEVEGEM | 5 | 5 | FLANDRE-OCCIDENTALE |
ZWIJNDRECHT | 1 | 2 | ANVERS |
581 rows × 3 columns
division_counts = pd.DataFrame(columns = ["#zipcodes", "#localities", "desc", "query"],
data=[
["=1", "=1", "Only one locality within the commune", "(zipcode==1) & (locality == 1)"],
["=1", ">1", "Zipcode share by all the localities", "(zipcode==1) & (locality>1)"],
[">1", "=zip", "A distinct zipcode for each locality", "(zipcode >1) & (zipcode==locality)"],
[">1", ">zip", "Some localities share zipcodes", "(zipcode >1) & (zipcode<locality)"],
# [">1", "<zip", "(more zip than localities, does not happen)", "(zipcode >1) & (zipcode>locality)"]
])
division_counts["BEL"] = division_counts["query"].apply(lambda q: commune_stats.query(q).locality.count())
division_counts.drop("query", axis=1)
#zipcodes | #localities | desc | BEL | |
---|---|---|---|---|
0 | =1 | =1 | Only one locality within the commune | 98 |
1 | =1 | >1 | Zipcode share by all the localities | 261 |
2 | >1 | =zip | A distinct zipcode for each locality | 53 |
3 | >1 | >zip | Some localities share zipcodes | 169 |
# Stats per province
prov_counts = division_counts["query"].apply(lambda q: commune_stats.query(q).groupby("Province").locality.count()).fillna(0)
division_counts_prov = pd.concat([division_counts.drop("query", axis=1), prov_counts], axis=1)
division_counts_prov
#zipcodes | #localities | desc | BEL | ANVERS | BRABANT FLAMAND | BRABANT WALLON | BRUXELLES | FLANDRE-OCCIDENTALE | FLANDRE-ORIENTALE | HAINAUT | LIEGE | LIMBOURG | LUXEMBOURG | NAMUR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | =1 | =1 | Only one locality within the commune | 98 | 33.0 | 11.0 | 3.0 | 18.0 | 11.0 | 7.0 | 2.0 | 4.0 | 8.0 | 1.0 | 0.0 |
1 | =1 | >1 | Zipcode share by all the localities | 261 | 24.0 | 21.0 | 17.0 | 0.0 | 37.0 | 30.0 | 27.0 | 42.0 | 18.0 | 22.0 | 23.0 |
2 | >1 | =zip | A distinct zipcode for each locality | 53 | 6.0 | 11.0 | 3.0 | 1.0 | 4.0 | 5.0 | 5.0 | 11.0 | 5.0 | 2.0 | 0.0 |
3 | >1 | >zip | Some localities share zipcodes | 169 | 6.0 | 22.0 | 4.0 | 0.0 | 12.0 | 18.0 | 35.0 | 27.0 | 11.0 | 19.0 | 15.0 |
division_counts_prov.set_index("desc").drop(["BEL", "#zipcodes", "#localities"], axis=1).T.plot.bar(figsize=(15,6))
<AxesSubplot:>
print("Communes with a locality having the same name")
display(zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)])
print("Number of (unique) communes:", zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)].main_commune.nunique())
Communes with a locality having the same name
zipcode | locality | Sous-commune | main_commune | Province | |
---|---|---|---|---|---|
6 | 9300 | Aalst | Non | AALST | FLANDRE-ORIENTALE |
7 | 9880 | Aalter | Non | AALTER | FLANDRE-ORIENTALE |
8 | 3200 | Aarschot | Non | AARSCHOT | BRABANT FLAMAND |
11 | 2630 | Aartselaar | Non | AARTSELAAR | ANVERS |
22 | 1790 | Affligem | Non | AFFLIGEM | BRABANT FLAMAND |
... | ... | ... | ... | ... | ... |
2757 | 8377 | Zuienkerke | Non | ZUIENKERKE | FLANDRE-OCCIDENTALE |
2758 | 9870 | Zulte | Non | ZULTE | FLANDRE-ORIENTALE |
2760 | 3690 | Zutendaal | Non | ZUTENDAAL | LIMBOURG |
2761 | 8550 | Zwevegem | Non | ZWEVEGEM | FLANDRE-OCCIDENTALE |
2764 | 2070 | Zwijndrecht | Non | ZWIJNDRECHT | ANVERS |
528 rows × 5 columns
Number of (unique) communes: 521
zipcodes_normal[zipcodes_normal["Sous-commune"] == "Non"]
zipcode | locality | Sous-commune | main_commune | Province | |
---|---|---|---|---|---|
6 | 9300 | Aalst | Non | AALST | FLANDRE-ORIENTALE |
7 | 9880 | Aalter | Non | AALTER | FLANDRE-ORIENTALE |
8 | 3200 | Aarschot | Non | AARSCHOT | BRABANT FLAMAND |
11 | 2630 | Aartselaar | Non | AARTSELAAR | ANVERS |
22 | 1790 | Affligem | Non | AFFLIGEM | BRABANT FLAMAND |
... | ... | ... | ... | ... | ... |
2757 | 8377 | Zuienkerke | Non | ZUIENKERKE | FLANDRE-OCCIDENTALE |
2758 | 9870 | Zulte | Non | ZULTE | FLANDRE-ORIENTALE |
2760 | 3690 | Zutendaal | Non | ZUTENDAAL | LIMBOURG |
2761 | 8550 | Zwevegem | Non | ZWEVEGEM | FLANDRE-OCCIDENTALE |
2764 | 2070 | Zwijndrecht | Non | ZWIJNDRECHT | ANVERS |
528 rows × 5 columns
# Several localities with the same name within the same commune
zipcodes[zipcodes[["locality", "main_commune"]].duplicated(keep=False)].sort_values("locality")
zipcode | locality | Sous-commune | main_commune | Province | is_special | |
---|---|---|---|---|---|---|
60 | 2000 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
61 | 2018 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
62 | 2020 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
63 | 2030 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
64 | 2040 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
65 | 2050 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
66 | 2060 | Antwerpen | Non | ANTWERPEN | ANVERS | False |
1181 | 9120 | Kallo | Oui | BEVEREN-WAAS | FLANDRE-ORIENTALE | False |
1182 | 9130 | Kallo | Oui | BEVEREN-WAAS | FLANDRE-ORIENTALE | False |
1359 | 4000 | Liège | Non | LIÈGE | LIEGE | False |
1360 | 4020 | Liège | Non | LIÈGE | LIEGE | False |
Get a list giving for each zipcode, its locality name, and geographical center.
Note : it does not contain any "special zicode"
# https://data.gov.be/fr/dataset/328ba4f140ba0e870dfc9c70635fe7c1840980b1
download_if_nexist("https://www.odwb.be/api/v2/catalog/datasets/code-postaux-belge/exports/geojson",
"data/zipcode_centers.geojson")
zipcodes_centers = gpd.read_file("data/zipcode_centers.geojson")
zipcodes_centers = zipcodes_centers.rename({"column_1": "zipcode", "column_2": "locality"}, axis=1)[["locality","zipcode", "geometry"]]
zipcodes_centers = zipcodes_centers.to_crs(crs)
print(zipcodes_centers.zipcode.nunique(), "unique zipcode")
zipcodes_centers
1145 unique zipcode
locality | zipcode | geometry | |
---|---|---|---|
0 | Saint-Gilles | 1060 | POINT (483757.549 6590703.228) |
1 | Molenbeek-Saint-Jean | 1080 | POINT (481209.435 6595571.150) |
2 | Ganshoren | 1083 | POINT (480623.048 6598549.025) |
3 | Neder-Over-Heembeek | 1120 | POINT (488746.985 6603234.900) |
4 | Wavre | 1300 | POINT (512931.716 6571283.981) |
... | ... | ... | ... |
2752 | Dikkelvenne | 9890 | POINT (410707.640 6606742.923) |
2753 | Lovendegem | 9920 | POINT (401597.631 6638236.014) |
2754 | Zomergem | 9930 | POINT (396769.791 6642472.970) |
2755 | Waarschoot | 9950 | POINT (401437.142 6648141.593) |
2756 | Assenede | 9960 | POINT (418205.916 6662086.731) |
2757 rows × 3 columns
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_centers.zipcode)]
zipcode | locality | Sous-commune | main_commune | Province | |
---|---|---|---|---|---|
1895 | 5352 | Perwez-Haillot | Oui | OHEY | NAMUR |
Get the boundaries for all zipcodes. Provided by BPost.
In the original files, multi-polygons as given as multiple (single)-polygons records. We merge them in ony one record.
Note that in the "center" file, if several entities share the same zip, the center of each entity is given, while here, we only have one (multi-)polygon per zipcode.
zipcode_boundaries_filename = "data/zipcode_boundaries_shapefile_3812.zip"
# https://www.geo.be/catalog/details/9738c7c0-5255-11ea-8895-34e12d0f0423?l=fr
download_if_nexist("https://bgu.bpost.be/assets/9738c7c0-5255-11ea-8895-34e12d0f0423_x-shapefile_3812.zip",
zipcode_boundaries_filename)
zipcodes_boundaries = gpd.read_file(f"zip://{zipcode_boundaries_filename}/3812")
zipcodes_boundaries["is_special"] = zipcodes_boundaries.CP_speciau ==1
zipcodes_boundaries = zipcodes_boundaries.rename({"nouveau_PO":"zipcode"}, axis=1)[["zipcode", "is_special", "geometry"]]
zipcodes_boundaries = zipcodes_boundaries.dissolve(["zipcode", "is_special"]).reset_index()
zipcodes_boundaries = zipcodes_boundaries.to_crs(crs)
zipcodes_boundaries
zipcode | is_special | geometry | |
---|---|---|---|
0 | 1000 | False | MULTIPOLYGON Z (((487951.058 6600061.994 0.000... |
1 | 1005 | True | POLYGON Z ((484391.808 6593846.983 0.000, 4844... |
2 | 1006 | True | POLYGON Z ((484413.037 6593943.815 0.000, 4843... |
3 | 1007 | True | POLYGON Z ((486232.320 6594255.851 0.000, 4862... |
4 | 1008 | True | POLYGON Z ((486000.332 6594258.665 0.000, 4858... |
... | ... | ... | ... |
1182 | 9982 | False | POLYGON Z ((398126.149 6670079.581 0.000, 3981... |
1183 | 9988 | False | POLYGON Z ((402165.046 6674165.098 0.000, 4021... |
1184 | 9990 | False | POLYGON Z ((383123.261 6664423.574 0.000, 3831... |
1185 | 9991 | False | POLYGON Z ((389466.378 6660430.021 0.000, 3895... |
1186 | 9992 | False | POLYGON Z ((376698.000 6669876.805 0.000, 3767... |
1187 rows × 3 columns
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_boundaries.zipcode)]
zipcode | locality | Sous-commune | main_commune | Province | |
---|---|---|---|---|---|
1175 | 6642 | Juseret | Oui | VAUX-SUR-SÛRE | LUXEMBOURG |
# Extra zipcode
zipcodes_boundaries[~zipcodes_boundaries.zipcode.isin(zipcodes.zipcode)]
zipcode | is_special | geometry | |
---|---|---|---|
1071 | 9 | False | POLYGON Z ((551743.087 6617609.407 0.000, 5517... |
Get boundaries for all statistical sectors, from StatBel. Statistical sectors can be grouped by NIS code, providing NIS code boundaries.
# https://statbel.fgov.be/fr/open-data/secteurs-statistiques-2020
download_if_nexist("https://statbel.fgov.be/sites/default/files/files/opendata/Statistische%20sectoren/sh_statbel_statistical_sectors_31370_20200101.shp.zip",
"data/stat_sectors_2020.zip")
statistical_sectors = gpd.read_file("zip://data/stat_sectors_2020.zip/sh_statbel_statistical_sectors_20200101.shp")
statistical_sectors["CNIS5_2020"] = statistical_sectors["CNIS5_2020"].astype(str)
# Group (with "dissolve") sectors per NIS code
nis_boundaries = statistical_sectors[["CNIS5_2020", "T_MUN_FR", "T_MUN_NL", "geometry"]].dissolve(by="CNIS5_2020").reset_index()
nis_boundaries = nis_boundaries.rename({"CNIS5_2020": "niscode"}, axis=1)
nis_boundaries = nis_boundaries.to_crs(crs)
As a first example of spatial join, we will join polygons (from zipcodes_boundaries) and points (from zipcodes_centers).
In the resulting table sjoin_zip_center, we'll get all combination of zipcodes_boundaries and zipcodes_centers, where the point of zipcodes_centers falls into the polygon of zipcodes_boundaries.
sjoin_mismatches will contain all combinations where the zipcode provided by zipcodes_centers does not match the zipcode provided by zipcodes_boundaries
sjoin_zip_center = gpd.sjoin(zipcodes_boundaries, zipcodes_centers)
sjoin_zip_center
zipcode_left | is_special | geometry | index_right | locality | zipcode_right | |
---|---|---|---|---|---|---|
0 | 1000 | False | MULTIPOLYGON Z (((487951.058 6600061.994 0.000... | 1682 | Bruxelles | 1000 |
8 | 1020 | False | POLYGON Z ((484519.836 6603848.988 0.000, 4845... | 2218 | Laeken | 1020 |
9 | 1030 | False | POLYGON Z ((488633.964 6600056.547 0.000, 4887... | 2219 | Schaerbeek | 1030 |
13 | 1040 | False | MULTIPOLYGON Z (((486480.015 6594068.947 0.000... | 1139 | Etterbeek | 1040 |
21 | 1050 | False | MULTIPOLYGON Z (((486898.793 6593112.249 0.000... | 591 | Ixelles | 1050 |
... | ... | ... | ... | ... | ... | ... |
1183 | 9988 | False | POLYGON Z ((402165.046 6674165.098 0.000, 4021... | 1138 | Watervliet | 9988 |
1183 | 9988 | False | POLYGON Z ((402165.046 6674165.098 0.000, 4021... | 1680 | Waterland-Oudeman | 9988 |
1184 | 9990 | False | POLYGON Z ((383123.261 6664423.574 0.000, 3831... | 1681 | Maldegem | 9990 |
1185 | 9991 | False | POLYGON Z ((389466.378 6660430.021 0.000, 3895... | 2216 | Adegem | 9991 |
1186 | 9992 | False | POLYGON Z ((376698.000 6669876.805 0.000, 3767... | 2217 | Middelburg | 9992 |
2757 rows × 6 columns
sjoin_mismatches = sjoin_zip_center[sjoin_zip_center.zipcode_left != sjoin_zip_center.zipcode_right].reset_index(drop=True)
sjoin_mismatches
zipcode_left | is_special | geometry | index_right | locality | zipcode_right | |
---|---|---|---|---|---|---|
0 | 1130 | False | POLYGON Z ((492165.130 6604187.117 0.000, 4921... | 2343 | Haren | 3700 |
1 | 1541 | False | POLYGON Z ((442698.747 6572867.831 0.000, 4427... | 499 | Driekapellen | 8600 |
2 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 45 | Antwerpen | 2050 |
3 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 1184 | Antwerpen | 2060 |
4 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 1181 | Antwerpen | 2030 |
5 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 1180 | Antwerpen | 2018 |
6 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 1740 | Antwerpen | 2020 |
7 | 2000 | False | POLYGON Z ((491447.469 6663216.307 0.000, 4914... | 2277 | Antwerpen | 2040 |
8 | 2180 | False | POLYGON Z ((496588.712 6668893.730 0.000, 4964... | 2324 | Laar | 3400 |
9 | 2180 | False | POLYGON Z ((496588.712 6668893.730 0.000, 4964... | 1187 | Schriek | 2223 |
10 | 2490 | False | POLYGON Z ((579405.019 6646044.757 0.000, 5793... | 1177 | Berg | 1910 |
11 | 2490 | False | POLYGON Z ((579405.019 6646044.757 0.000, 5793... | 2342 | Berg | 3700 |
12 | 2560 | False | POLYGON Z ((522581.783 6652932.359 0.000, 5226... | 1275 | Voort | 3840 |
13 | 3202 | False | POLYGON Z ((545421.333 6620399.304 0.000, 5454... | 1802 | Donk | 3540 |
14 | 3300 | False | POLYGON Z ((546638.037 6597317.597 0.000, 5466... | 697 | Sint-Margriete-Houtem | 3470 |
15 | 3700 | False | POLYGON Z ((613768.565 6589710.582 0.000, 6137... | 135 | Kolmont | 3840 |
16 | 3770 | False | POLYGON Z ((625035.587 6592809.132 0.000, 6250... | 2736 | Elst | 9660 |
17 | 4000 | False | MULTIPOLYGON Z (((622294.900 6548667.801 0.000... | 742 | Liège | 4020 |
18 | 4120 | False | POLYGON Z ((604356.408 6544578.503 0.000, 6043... | 772 | Ehein | 4480 |
19 | 4122 | False | POLYGON Z ((617503.511 6544062.555 0.000, 6175... | 153 | Neupré | 4120 |
20 | 4141 | False | POLYGON Z ((640688.017 6541527.140 0.000, 6406... | 1354 | Louveigné | 4920 |
21 | 4400 | False | POLYGON Z ((609617.951 6554549.539 0.000, 6096... | 1489 | Noirfontaine | 6831 |
22 | 4852 | False | POLYGON Z ((664231.609 6575203.101 0.000, 6642... | 230 | Plombières | 4850 |
23 | 4910 | False | POLYGON Z ((654464.107 6535116.816 0.000, 6544... | 1947 | Mont | 5530 |
24 | 4910 | False | POLYGON Z ((654464.107 6535116.816 0.000, 6544... | 1463 | Mont | 6661 |
25 | 4910 | False | POLYGON Z ((654464.107 6535116.816 0.000, 6544... | 1348 | Polleur | 4800 |
26 | 4960 | False | POLYGON Z ((680570.480 6512355.732 0.000, 6805... | 931 | Bellevaux | 6834 |
27 | 4987 | False | POLYGON Z ((651334.479 6514275.104 0.000, 6512... | 307 | Neuville | 5600 |
28 | 4987 | False | POLYGON Z ((651334.479 6514275.104 0.000, 6512... | 2445 | La Bruyère | 5080 |
29 | 4987 | False | POLYGON Z ((651334.479 6514275.104 0.000, 6512... | 2421 | Andrimont | 4821 |
30 | 5541 | False | POLYGON Z ((538168.187 6483865.512 0.000, 5382... | 287 | Hastière | 5540 |
31 | 5576 | False | POLYGON Z ((558829.518 6455080.381 0.000, 5589... | 854 | Honnay | 5570 |
32 | 6640 | False | POLYGON Z ((627657.956 6447744.344 0.000, 6276... | 905 | Juseret | 6642 |
33 | 6666 | False | POLYGON Z ((636191.395 6480967.171 0.000, 6362... | 1510 | Mormont | 6997 |
34 | 6681 | False | POLYGON Z ((616954.726 6458024.935 0.000, 6169... | 2021 | Sainte-Ode | 6680 |
35 | 6724 | False | POLYGON Z ((620859.901 6407429.477 0.000, 6213... | 1473 | Habay | 6720 |
36 | 6820 | False | MULTIPOLYGON Z (((593091.087 6396179.853 0.000... | 226 | Lambermont | 4800 |
37 | 7141 | False | POLYGON Z ((476514.652 6524706.111 0.000, 4764... | 967 | Morlanwelz | 7140 |
38 | 7624 | False | POLYGON Z ((374657.316 6537126.140 0.000, 3747... | 1552 | Brunehaut | 7620 |
39 | 7811 | False | POLYGON Z ((426644.077 6556386.940 0.000, 4269... | 2084 | Maffle | 7810 |
40 | 8301 | False | POLYGON Z ((364600.110 6682813.553 0.000, 3646... | 1582 | Knokke-Heist | 8300 |
41 | 8620 | False | POLYGON Z ((314007.990 6639410.105 0.000, 3139... | 2347 | Sluizen | 3700 |
42 | 8755 | False | POLYGON Z ((370845.213 6638802.585 0.000, 3708... | 2722 | Bambrugge | 9420 |
43 | 8952 | False | POLYGON Z ((317308.967 6581658.756 0.000, 3172... | 1619 | Heuvelland | 8950 |
44 | 9120 | False | POLYGON Z ((475777.507 6667578.396 0.000, 4757... | 1083 | Kallo | 9130 |
45 | 9130 | False | POLYGON Z ((478761.897 6667836.361 0.000, 4779... | 2707 | Kallo | 9120 |
46 | 9180 | False | POLYGON Z ((442800.468 6661195.878 0.000, 4428... | 1648 | Moerbeke | 9500 |
47 | 9180 | False | POLYGON Z ((442800.468 6661195.878 0.000, 4428... | 1628 | Sinaai-Waas | 9112 |
48 | 9220 | False | POLYGON Z ((464845.765 6639208.960 0.000, 4647... | 630 | Hamme | 1785 |
49 | 9620 | False | POLYGON Z ((426237.365 6605895.998 0.000, 4262... | 1114 | Sint-Maria-Oudenhove | 9660 |
50 | 9620 | False | POLYGON Z ((426237.365 6605895.998 0.000, 4262... | 2180 | Oombergen | 9520 |
51 | 9790 | False | POLYGON Z ((395001.891 6600120.574 0.000, 3950... | 1120 | Ooike | 9700 |
We can now explore all those mismatch. As a way to make decisions easier, we provide several data:
def plot_mismatch_zip(sjoin_mismatches, i):
print(f"Mismatch record {i}:")
display(sjoin_mismatches.iloc[[i]])
zip1 = sjoin_mismatches.iloc[i].zipcode_left
zip2 = sjoin_mismatches.iloc[i].zipcode_right
print("From BPost:")
display( zipcodes[ zipcodes.zipcode.isin([zip1, zip2])])
try:
print("Know by OSM at the given point:")
pt = zipcodes_centers.to_crs(osm_crs).loc[sjoin_mismatches.iloc[i].index_right].geometry
geolocator = geopy.geocoders.Nominatim(user_agent="smalsresearch")
print(geolocator.reverse(f"{pt.y}, {pt.x}").address)
print("Know by OSM for the given locality:")
loc = zipcodes_centers.loc[sjoin_mismatches.iloc[i].index_right].locality
print(geolocator.geocode(f"{loc}, Belgium").address)
except Exception as e:
print("Cannot get OSM data:", e)
query = f"zipcode in ('{zip1}', '{zip2}')"
df_bnd = zipcodes_boundaries.assign(label=zipcodes_boundaries.zipcode).query(query).to_crs(osm_crs)
df_pnt = zipcodes_centers.assign(label=zipcodes_centers.zipcode+" "+zipcodes_centers.locality).query(query).to_crs(osm_crs)
df_pnt["mismatch"] = False
df_pnt.loc[sjoin_mismatches.iloc[i].index_right, "mismatch"] = True
if with_plotly:
fig = px.choropleth_mapbox(df_bnd,
geojson=df_bnd.geometry,
locations=df_bnd.index,
hover_name="label",
color=df_bnd['label'],
center={"lat": df_bnd.bounds.mean()[["miny", "maxy"]].mean(),
"lon": df_bnd.bounds.mean()[["minx", "maxx"]].mean()},
mapbox_style="open-street-map",
#zoom=10,
opacity=0.5,
)
for i, bloc in df_pnt.groupby("mismatch"):
fig.add_trace(go.Scattermapbox(lat=bloc.geometry.y,
lon=bloc.geometry.x,
mode='markers+text', # "markers+text",
text=bloc.label,
name="Zipcode centers" + (" (mismatch)" if i else ""),
textposition="bottom center"
)
)
fig.show()
else:
df=pd.concat([df_bnd, df_pnt]).to_crs(crs)
df["label"] = df["label"] + np.where(df.mismatch==True, "*", "")
ax = df.plot("label", figsize=(10,15), legend=True)
try:
ctx.add_basemap(ax)
except Exception as e:
print("Cannot add basemap: ", e)
plt.show()
display(df[["zipcode", "geometry","label"]])
# for i in range(sjoin_mismatches.shape[0]):
for i in [8, 19, 25, 30, 43]:
plot_mismatch_zip(sjoin_mismatches, i)
print("--------------")
Mismatch record 8:
zipcode_left | is_special | geometry | index_right | locality | zipcode_right | |
---|---|---|---|---|---|---|
8 | 2180 | False | POLYGON Z ((496588.712 6668893.730 0.000, 4964... | 2324 | Laar | 3400 |
From BPost:
zipcode | locality | Sous-commune | main_commune | Province | is_special | |
---|---|---|---|---|---|---|
603 | 2180 | Ekeren | Oui | ANTWERPEN | ANVERS | False |
609 | 3400 | Eliksem | Oui | LANDEN | BRABANT FLAMAND | False |
683 | 3400 | Ezemaal | Oui | LANDEN | BRABANT FLAMAND | False |
1266 | 3400 | Laar | Oui | LANDEN | BRABANT FLAMAND | False |
1285 | 3400 | Landen | Non | LANDEN | BRABANT FLAMAND | False |
1692 | 3400 | Neerwinden | Oui | LANDEN | BRABANT FLAMAND | False |
1867 | 3400 | Overwinden | Oui | LANDEN | BRABANT FLAMAND | False |
2079 | 3400 | Rumsdorp | Oui | LANDEN | BRABANT FLAMAND | False |
2581 | 3400 | Wange | Oui | LANDEN | BRABANT FLAMAND | False |
Know by OSM at the given point: 11, Laar, Donk, Ekeren, Antwerpen, Vlaanderen, 2180, België / Belgique / Belgien Know by OSM for the given locality: Laar, Wommelgem, Antwerpen, Vlaanderen, 2160, België / Belgique / Belgien