Automatic Zoning Procedure (AZP) algorithm¶

Authors: Xin Feng

AZP can work with different types of objective functions, which are very sensitive to aggregating data from a large number of zones into a pre-designated smaller number of regions.

AZP was originally formulated in Openshaw, 1977 and then extended in Openshaw, S. and Rao, L. (1995).

In [1]:

import warnings
warnings.filterwarnings('ignore')
import geopandas as gpd
import libpysal
import numpy as np

import sys
sys.path.append("../")
from spopt.region import AZP

In [2]:

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 8]

Mexican State Regional Income Clustering¶

To illustrate azp we utilize data on regional incomes for Mexican states over the period 1940-2000, originally used in Rey and Sastré-Gutiérrez (2010).

We can first explore the data by plotting the per capital gross regional domestic product (in constant USD 2000 dollars) for each year in the sample, using a quintile classification:

In [3]:

pth = libpysal.examples.get_path('mexicojoin.shp')
mexico = gpd.read_file(pth)

In [4]:

for year in range(1940, 2010, 10):
    ax = mexico.plot(column=f'PCGDP{year}', scheme='Quantiles', cmap='GnBu', edgecolor='b', legend=True)
    _ = ax.axis('off')
    plt.title(str(year))

Regionalization¶

First, we specify a number of parameters that will serve as input to the azp model.

The variables in the dataframe that will be used to measure regional dissimilarity:

In [5]:

attrs_name = [f'PCGDP{year}' for year in range(1950,2010, 10)]
attrs_name

Out[5]:

['PCGDP1950', 'PCGDP1960', 'PCGDP1970', 'PCGDP1980', 'PCGDP1990', 'PCGDP2000']

A spatial weights object expresses the spatial connectivity of the zones:

In [6]:

w = libpysal.weights.Queen.from_dataframe(mexico)

The number of regions that we would like to aggregate these zones into:

In [7]:

n_clusters = 5

There are four optional parameters. In this example, we only use the default settings, you can define them as needed.

allow_move_strategy: For a different behavior for allowing moves, an AllowMoveStrategy instance can be passed as argument.

class: AllowMoveStrategy or None, default: None

random_state: Random seed.

None, int, str, bytes, or bytearray, default: None

initial_labels: One-dimensional array of labels at the beginning of the algorithm.

class: numpy.ndarray or None, default: None
If None, then a random initial clustering will be generated.

objective_func: the objective function to use.

class: spopt.region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()

The model can then be solved:

In [8]:

model = AZP(mexico, w, attrs_name, n_clusters)
model.solve()

n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}

In [9]:

mexico['azp_new'] = model.labels_

In [10]:

mexico['number'] = 1
mexico[['azp_new','number']].groupby(by='azp_new').count()

Out[10]:

	number
azp_new
0.0	5
1.0	8
2.0	9
3.0	5
4.0	5

In [11]:

mexico.plot(column='azp_new', categorical=True, edgecolor='w')

Out[11]:

<AxesSubplot:>

The model solution results in five regions, two of which have five states, one with four, one with eight, and one with ten states.

Year-by-Year Regionalization (n_clusters = 5 regions)¶

In [12]:

for year in attrs_name:
    
    model = AZP(mexico, w, year, 5)
    model.solve()
    lab = year+'labels_'
    mexico[lab] = model.labels_
    ax = mexico.plot(column=lab, categorical=True, edgecolor='w')
    plt.title(year)
    _ = ax.axis('off')

n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}
n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}
n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}
n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}
n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}
n_regions_per_comp {0: 5}
comp_label 0
n_regions_in_comp 5
Regions in comp: {0, 1, 2, 3, 4}