A common workflow with longitudinal spatial data is to apply the same classification scheme to an attribute over different time periods. More specifically, one would like to keep the class breaks the same over each period and examine how the mass of the distribution changes over these classes in the different periods.
The Pooled
classifier supports this workflow.
import numpy as np
import mapclassify as mc
We construct a synthetic dataset composed of 20 cross-sectional units at three time points. Here the mean of the series is increasing over time.
n = 20
data = np.array([np.arange(n)+i*n for i in range(1,4)]).T
data.shape
(20, 3)
data
array([[20, 40, 60], [21, 41, 61], [22, 42, 62], [23, 43, 63], [24, 44, 64], [25, 45, 65], [26, 46, 66], [27, 47, 67], [28, 48, 68], [29, 49, 69], [30, 50, 70], [31, 51, 71], [32, 52, 72], [33, 53, 73], [34, 54, 74], [35, 55, 75], [36, 56, 76], [37, 57, 77], [38, 58, 78], [39, 59, 79]])
res = mc.Pooled(data)
res
Pooled Classifier Pooled Quantiles Interval Count ---------------------- [20.00, 31.80] | 12 (31.80, 43.60] | 8 (43.60, 55.40] | 0 (55.40, 67.20] | 0 (67.20, 79.00] | 0 Pooled Quantiles Interval Count ---------------------- ( -inf, 31.80] | 0 (31.80, 43.60] | 4 (43.60, 55.40] | 12 (55.40, 67.20] | 4 (67.20, 79.00] | 0 Pooled Quantiles Interval Count ---------------------- ( -inf, 31.80] | 0 (31.80, 43.60] | 0 (43.60, 55.40] | 0 (55.40, 67.20] | 8 (67.20, 79.00] | 12
Note that the class definitions are identical with the exception of the lower bound in the first period. Since the first period contains the minimum value in the pooled series, that value defines the closed lower bound in the first period. In subsequent periods, the local minimums are all greater than the closed upper bound on the first interval - in other words the local minimums are not contained in the first class for the second and third periods. Following the policy in mapclassify, the lower bounds for the second and third periods are both set to -inf
to indicate that their minimum values are not contained in the first class.
res = mc.Pooled(data, k=4)
res.col_classifiers[0].counts
array([15, 5, 0, 0])
res.col_classifiers[-1].counts
array([ 0, 0, 5, 15])
res.global_classifier.counts
array([15, 15, 15, 15])
res
Pooled Classifier Pooled Quantiles Interval Count ---------------------- [20.00, 34.75] | 15 (34.75, 49.50] | 5 (49.50, 64.25] | 0 (64.25, 79.00] | 0 Pooled Quantiles Interval Count ---------------------- ( -inf, 34.75] | 0 (34.75, 49.50] | 10 (49.50, 64.25] | 10 (64.25, 79.00] | 0 Pooled Quantiles Interval Count ---------------------- ( -inf, 34.75] | 0 (34.75, 49.50] | 0 (49.50, 64.25] | 5 (64.25, 79.00] | 15
Extract the pooled classification objects for each column
c0, c1, c2 = res.col_classifiers
c0
Pooled Quantiles Interval Count ---------------------- [20.00, 34.75] | 15 (34.75, 49.50] | 5 (49.50, 64.25] | 0 (64.25, 79.00] | 0
Compare to the unrestricted classifier for the first column
mc.Quantiles(c0.y, k=4)
Quantiles Interval Count ---------------------- [20.00, 24.75] | 5 (24.75, 29.50] | 5 (29.50, 34.25] | 5 (34.25, 39.00] | 5
and the last column comparisions
c2
Pooled Quantiles Interval Count ---------------------- ( -inf, 34.75] | 0 (34.75, 49.50] | 0 (49.50, 64.25] | 5 (64.25, 79.00] | 15
mc.Quantiles(c2.y, k=4)
Quantiles Interval Count ---------------------- [60.00, 64.75] | 5 (64.75, 69.50] | 5 (69.50, 74.25] | 5 (74.25, 79.00] | 5
res = mc.Pooled(data, classifier='BoxPlot', hinge=1.5)
res
Pooled Classifier Pooled BoxPlot Interval Count ------------------------ ( -inf, -9.50] | 0 ( -9.50, 34.75] | 15 ( 34.75, 49.50] | 5 ( 49.50, 64.25] | 0 ( 64.25, 108.50] | 0 Pooled BoxPlot Interval Count ------------------------ ( -inf, -9.50] | 0 ( -9.50, 34.75] | 0 ( 34.75, 49.50] | 10 ( 49.50, 64.25] | 10 ( 64.25, 108.50] | 0 Pooled BoxPlot Interval Count ------------------------ ( -inf, -9.50] | 0 ( -9.50, 34.75] | 0 ( 34.75, 49.50] | 0 ( 49.50, 64.25] | 5 ( 64.25, 108.50] | 15
res.col_classifiers[0].bins
array([ -9.5 , 34.75, 49.5 , 64.25, 108.5 ])
c0, c1, c2 = res.col_classifiers
c0.yb
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])
c00 = mc.BoxPlot(c0.y, hinge=3)
c00.yb
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])
c00
BoxPlot Interval Count ---------------------- ( -inf, -3.75] | 0 (-3.75, 24.75] | 5 (24.75, 29.50] | 5 (29.50, 34.25] | 5 (34.25, 62.75] | 5
c0
Pooled BoxPlot Interval Count ------------------------ ( -inf, -9.50] | 0 ( -9.50, 34.75] | 15 ( 34.75, 49.50] | 5 ( 49.50, 64.25] | 0 ( 64.25, 108.50] | 0
res = mc.Pooled(data, classifier='FisherJenks', k=5)
res
Pooled Classifier Pooled FisherJenks Interval Count ---------------------- [20.00, 31.00] | 12 (31.00, 43.00] | 8 (43.00, 55.00] | 0 (55.00, 67.00] | 0 (67.00, 79.00] | 0 Pooled FisherJenks Interval Count ---------------------- ( -inf, 31.00] | 0 (31.00, 43.00] | 4 (43.00, 55.00] | 12 (55.00, 67.00] | 4 (67.00, 79.00] | 0 Pooled FisherJenks Interval Count ---------------------- ( -inf, 31.00] | 0 (31.00, 43.00] | 0 (43.00, 55.00] | 0 (55.00, 67.00] | 8 (67.00, 79.00] | 12
c0, c1, c2 = res.col_classifiers
mc.FisherJenks(c0.y, k=5)
FisherJenks Interval Count ---------------------- [20.00, 23.00] | 4 (23.00, 27.00] | 4 (27.00, 31.00] | 4 (31.00, 35.00] | 4 (35.00, 39.00] | 4
data[1, 0] = 10
data[1, 1] = 10
data[1, 2] = 10
data[9, 2] = 10
data
array([[20, 40, 60], [10, 10, 10], [22, 42, 62], [23, 43, 63], [24, 44, 64], [25, 45, 65], [26, 46, 66], [27, 47, 67], [28, 48, 68], [29, 49, 10], [30, 50, 70], [31, 51, 71], [32, 52, 72], [33, 53, 73], [34, 54, 74], [35, 55, 75], [36, 56, 76], [37, 57, 77], [38, 58, 78], [39, 59, 79]])
res = mc.Pooled(data, classifier='MaximumBreaks', k=5)
res
Pooled Classifier Pooled MaximumBreaks Interval Count ---------------------- [10.00, 15.00] | 1 (15.00, 21.00] | 1 (21.00, 41.00] | 18 (41.00, 61.00] | 0 (61.00, 79.00] | 0 Pooled MaximumBreaks Interval Count ---------------------- [10.00, 15.00] | 1 (15.00, 21.00] | 0 (21.00, 41.00] | 1 (41.00, 61.00] | 18 (61.00, 79.00] | 0 Pooled MaximumBreaks Interval Count ---------------------- [10.00, 15.00] | 2 (15.00, 21.00] | 0 (21.00, 41.00] | 0 (41.00, 61.00] | 1 (61.00, 79.00] | 17
c0, c1, c2 = res.col_classifiers
c0
Pooled MaximumBreaks Interval Count ---------------------- [10.00, 15.00] | 1 (15.00, 21.00] | 1 (21.00, 41.00] | 18 (41.00, 61.00] | 0 (61.00, 79.00] | 0
mc.MaximumBreaks(c0.y, k=5)
Insufficient number of unique diffs. Breaks are random.
MaximumBreaks Interval Count ---------------------- [10.00, 15.00] | 1 (15.00, 21.00] | 1 (21.00, 22.50] | 1 (22.50, 28.50] | 6 (28.50, 39.00] | 11
res = mc.Pooled(data, classifier='UserDefined', bins=mc.Quantiles(data[:,-1]).bins)
res
Pooled Classifier Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 20 (62.80, 66.60] | 0 (66.60, 71.40] | 0 (71.40, 75.20] | 0 (75.20, 79.00] | 0 Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 20 (62.80, 66.60] | 0 (66.60, 71.40] | 0 (71.40, 75.20] | 0 (75.20, 79.00] | 0 Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 4 (62.80, 66.60] | 4 (66.60, 71.40] | 4 (71.40, 75.20] | 4 (75.20, 79.00] | 4
mc.Quantiles(data[:,-1])
Quantiles Interval Count ---------------------- [10.00, 62.80] | 4 (62.80, 66.60] | 4 (66.60, 71.40] | 4 (71.40, 75.20] | 4 (75.20, 79.00] | 4
data[:,-1]
array([60, 10, 62, 63, 64, 65, 66, 67, 68, 10, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
pinned = mc.Pooled(data, classifier='UserDefined', bins=mc.Quantiles(data[:,-1]).bins)
pinned
Pooled Classifier Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 20 (62.80, 66.60] | 0 (66.60, 71.40] | 0 (71.40, 75.20] | 0 (75.20, 79.00] | 0 Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 20 (62.80, 66.60] | 0 (66.60, 71.40] | 0 (71.40, 75.20] | 0 (75.20, 79.00] | 0 Pooled UserDefined Interval Count ---------------------- [10.00, 62.80] | 4 (62.80, 66.60] | 4 (66.60, 71.40] | 4 (71.40, 75.20] | 4 (75.20, 79.00] | 4
pinned.global_classifier
UserDefined Interval Count ---------------------- [10.00, 62.80] | 44 (62.80, 66.60] | 4 (66.60, 71.40] | 4 (71.40, 75.20] | 4 (75.20, 79.00] | 4
pinned = mc.Pooled(data, classifier='UserDefined', bins=mc.Quantiles(data[:,0]).bins)
pinned
Pooled Classifier Pooled UserDefined Interval Count ---------------------- [10.00, 23.80] | 4 (23.80, 27.60] | 4 (27.60, 31.40] | 4 (31.40, 35.20] | 4 (35.20, 39.00] | 4 (39.00, 79.00] | 0 Pooled UserDefined Interval Count ---------------------- [10.00, 23.80] | 1 (23.80, 27.60] | 0 (27.60, 31.40] | 0 (31.40, 35.20] | 0 (35.20, 39.00] | 0 (39.00, 79.00] | 19 Pooled UserDefined Interval Count ---------------------- [10.00, 23.80] | 2 (23.80, 27.60] | 0 (27.60, 31.40] | 0 (31.40, 35.20] | 0 (35.20, 39.00] | 0 (39.00, 79.00] | 18
Note that the quintiles for the first period, by definition, contain all the values from that period, they do not bound the larger values in subsequent period. Following the mapclassify policy, an additional class is added to contain all values in the pooled series.