Authors: Xin Feng
AZP can work with different types of objective functions, which are very sensitive to aggregating data from a large number of zones into a pre-designated smaller number of regions.
AZP was originally formulated in Openshaw, 1977 and then extended in Openshaw, S. and Rao, L. (1995).
import warnings
warnings.filterwarnings('ignore')
import geopandas as gpd
import libpysal
import numpy as np
import sys
sys.path.append("../")
from spopt.region import AZP
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 8]
To illustrate azp
we utilize data on regional incomes for Mexican states over the period 1940-2000, originally used in Rey and Sastré-Gutiérrez (2010).
We can first explore the data by plotting the per capital gross regional domestic product (in constant USD 2000 dollars) for each year in the sample, using a quintile classification:
pth = libpysal.examples.get_path('mexicojoin.shp')
mexico = gpd.read_file(pth)
for year in range(1940, 2010, 10):
ax = mexico.plot(column=f'PCGDP{year}', scheme='Quantiles', cmap='GnBu', edgecolor='b', legend=True)
_ = ax.axis('off')
plt.title(str(year))
First, we specify a number of parameters that will serve as input to the azp
model.
The variables in the dataframe that will be used to measure regional dissimilarity:
attrs_name = [f'PCGDP{year}' for year in range(1950,2010, 10)]
attrs_name
['PCGDP1950', 'PCGDP1960', 'PCGDP1970', 'PCGDP1980', 'PCGDP1990', 'PCGDP2000']
A spatial weights object expresses the spatial connectivity of the zones:
w = libpysal.weights.Queen.from_dataframe(mexico)
The number of regions that we would like to aggregate these zones into:
n_clusters = 5
There are four optional parameters. In this example, we only use the default settings, you can define them as needed.
allow_move_strategy: For a different behavior for allowing moves, an AllowMoveStrategy instance can be passed as argument.
class: AllowMoveStrategy or None, default: None
random_state: Random seed.
None, int, str, bytes, or bytearray, default: None
initial_labels: One-dimensional array of labels at the beginning of the algorithm.
class: numpy.ndarray or None, default: None
If None, then a random initial clustering will be generated.
objective_func: the objective function to use.
class: spopt.region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()
The model can then be solved:
model = AZP(mexico, w, attrs_name, n_clusters)
model.solve()
n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4}
mexico['azp_new'] = model.labels_
mexico['number'] = 1
mexico[['azp_new','number']].groupby(by='azp_new').count()
number | |
---|---|
azp_new | |
0.0 | 5 |
1.0 | 8 |
2.0 | 9 |
3.0 | 5 |
4.0 | 5 |
mexico.plot(column='azp_new', categorical=True, edgecolor='w')
<AxesSubplot:>
The model solution results in five regions, two of which have five states, one with four, one with eight, and one with ten states.
for year in attrs_name:
model = AZP(mexico, w, year, 5)
model.solve()
lab = year+'labels_'
mexico[lab] = model.labels_
ax = mexico.plot(column=lab, categorical=True, edgecolor='w')
plt.title(year)
_ = ax.axis('off')
n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4} n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4} n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4} n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4} n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4} n_regions_per_comp {0: 5} comp_label 0 n_regions_in_comp 5 Regions in comp: {0, 1, 2, 3, 4}