Author: Serge Rey sjsrey@gmail.com and Wei Kang weikang9009@gmail.com
This notebook introduces the basic PointPattern class in PySAL and covers the following:
We introduce basic terminology here and point the interested reader to more detailed references on the underlying theory of the statistical analysis of point patterns.
To start we consider a series of point locations, $(s_1, s_2, \ldots, s_n)$ in a study region $\Re$. We limit our focus here to a two-dimensional space so that $s_j = (x_j, y_j)$ is the spatial coordinate pair for point location $j$.
We will be interested in two different types of points.
Event Points are locations where something of interest has occurred. The term event is very general here and could be used to represent a wide variety of phenomena. Some examples include:
among many others.
It is important to recognize that in the statistical analysis of point patterns the interest extends beyond the observed point pattern at hand. The observed patterns are viewed as realizations from some underlying spatial stochastic process.
The second type of point we consider are those locations where the phenomena of interest has not been observed. These go by various names such as "empty space" or "regular" points, and at first glance might seem less interesting to a spatial analayst. However, these types of points play a central role in a class of point pattern methods that we explore below.
The analysis of event points focuses on a number of different characteristics of the collective spatial pattern that is observed. Often the pattern is jugded against the hypothesis of complete spatial randomness (CSR). That is, one assumes that the point events arise independently of one another and with constant probability across $\Re$, loosely speaking.
Of course, many of the empirical point patterns we encounter do not appear to be generated from such a simple stochastic process. The depatures from CSR can be due to two types of effects.
For a point process, the first-order properties pertain to the intensity of the process across space. Whether and how the intensity of the point pattern varies within our study region are questions that assume center stage. Such variation in the itensity of the pattern of, say, addresses of individuals with a certain type of non-infectious disease may reflect the underlying population density. In other words, although the point pattern of disease cases may display variation in intensity in our study region, and thus violate the constant probability of an event condition, that spatial drift in the pattern intensity could be driven by an underlying covariate.
The second channel by which departures from CSR can arise is through interaction and dependence between events in space. The canonical example being contagious diseases whereby the presence of an infected individual increases the probability of subsequent additional cases nearby.
When a pattern departs from expectation under CSR, this is suggestive that the underlying process may have some spatial structure that merits further investigation. Thus methods for detection of deviations from CSR and testing for alternative processes have given rise to a large literature in point pattern statistics.
The points module in PySAL implements basic methods of point pattern analysis organized into the following groups:
In the remainder of this notebook we shall focus on point processing.
import libpysal as ps
import numpy as np
from pointpats import PointPattern
We can build a point pattern by using Python lists of coordinate pairs $(s_0, s_1,\ldots, s_m)$ as follows:
points = [[66.22, 32.54], [22.52, 22.39], [31.01, 81.21],
[9.47, 31.02], [30.78, 60.10], [75.21, 58.93],
[79.26, 7.68], [8.23, 39.93], [98.73, 77.17],
[89.78, 42.53], [65.19, 92.08], [54.46, 8.48]]
p1 = PointPattern(points)
p1.mbb
array([ 8.23, 7.68, 98.73, 92.08])
Thus $s_0 = (66.22, 32.54), \ s_{11}=(54.46, 8.48)$.
p1.summary()
Point Pattern 12 points Bounding rectangle [(8.23,7.68), (98.73,92.08)] Area of window: 7638.200000000002 Intensity estimate for window: 0.0015710507711240865 x y 0 66.22 32.54 1 22.52 22.39 2 31.01 81.21 3 9.47 31.02 4 30.78 60.10
type(p1.points)
pandas.core.frame.DataFrame
np.asarray(p1.points)
array([[66.22, 32.54], [22.52, 22.39], [31.01, 81.21], [ 9.47, 31.02], [30.78, 60.1 ], [75.21, 58.93], [79.26, 7.68], [ 8.23, 39.93], [98.73, 77.17], [89.78, 42.53], [65.19, 92.08], [54.46, 8.48]])
p1.mbb
array([ 8.23, 7.68, 98.73, 92.08])
points = np.asarray(points)
points
array([[66.22, 32.54], [22.52, 22.39], [31.01, 81.21], [ 9.47, 31.02], [30.78, 60.1 ], [75.21, 58.93], [79.26, 7.68], [ 8.23, 39.93], [98.73, 77.17], [89.78, 42.53], [65.19, 92.08], [54.46, 8.48]])
p1_np = PointPattern(points)
p1_np.summary()
Point Pattern 12 points Bounding rectangle [(8.23,7.68), (98.73,92.08)] Area of window: 7638.200000000002 Intensity estimate for window: 0.0015710507711240865 x y 0 66.22 32.54 1 22.52 22.39 2 31.01 81.21 3 9.47 31.02 4 30.78 60.10
This example uses 200 randomly distributed points within the counties of Virginia. Coordinates are for UTM zone 17 N.
f = ps.examples.get_path('vautm17n_points.shp')
fo = ps.io.open(f)
pp_va = PointPattern(np.asarray([pnt for pnt in fo]))
fo.close()
pp_va.summary()
Point Pattern 200 points Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)] Area of window: 216845506675.0557 Intensity estimate for window: 9.223156295311261e-10 x y 0 865322.486181 4.150317e+06 1 774479.213103 4.258993e+06 2 308048.692232 4.054700e+06 3 670711.529980 4.258864e+06 4 666254.475614 4.256514e+06
pp_va.summary()
Point Pattern 200 points Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)] Area of window: 216845506675.0557 Intensity estimate for window: 9.223156295311261e-10 x y 0 865322.486181 4.150317e+06 1 774479.213103 4.258993e+06 2 308048.692232 4.054700e+06 3 670711.529980 4.258864e+06 4 666254.475614 4.256514e+06
pp_va.points
x | y | |
---|---|---|
0 | 865322.486181 | 4.150317e+06 |
1 | 774479.213103 | 4.258993e+06 |
2 | 308048.692232 | 4.054700e+06 |
3 | 670711.529980 | 4.258864e+06 |
4 | 666254.475614 | 4.256514e+06 |
5 | 664464.571678 | 4.061242e+06 |
6 | 784718.209785 | 4.076109e+06 |
7 | 972595.989578 | 4.183781e+06 |
8 | 657798.357403 | 4.253278e+06 |
9 | 682259.020242 | 4.282441e+06 |
10 | 727004.821077 | 4.068344e+06 |
11 | 705895.874812 | 4.266602e+06 |
12 | 828584.046576 | 4.065666e+06 |
13 | 713905.086059 | 4.316151e+06 |
14 | 881552.803340 | 4.091455e+06 |
15 | 469051.359337 | 4.117702e+06 |
16 | 316765.769715 | 4.074300e+06 |
17 | 697788.542435 | 4.060254e+06 |
18 | 735806.711384 | 4.169688e+06 |
19 | 857188.061626 | 4.069335e+06 |
20 | 840068.036835 | 4.157035e+06 |
21 | 554658.507423 | 4.056777e+06 |
22 | 273959.664381 | 4.057244e+06 |
23 | 751755.354691 | 4.212530e+06 |
24 | 862508.493456 | 4.068196e+06 |
25 | 608082.366460 | 4.137227e+06 |
26 | 783720.206483 | 4.131364e+06 |
27 | 648766.829060 | 4.193105e+06 |
28 | 560753.141222 | 4.059971e+06 |
29 | 659157.093262 | 4.157386e+06 |
... | ... | ... |
170 | 693186.966524 | 4.139941e+06 |
171 | 845699.719699 | 4.231892e+06 |
172 | 796797.110375 | 4.313534e+06 |
173 | 691583.213674 | 4.074581e+06 |
174 | 752905.895923 | 4.166523e+06 |
175 | 963207.941343 | 4.165624e+06 |
176 | 611691.334171 | 4.049221e+06 |
177 | 777399.041143 | 4.170244e+06 |
178 | 781453.204801 | 4.124116e+06 |
179 | 675900.040876 | 4.059235e+06 |
180 | 530691.417350 | 4.087626e+06 |
181 | 555641.288115 | 4.122360e+06 |
182 | 532600.970765 | 4.051876e+06 |
183 | 800528.454702 | 4.335969e+06 |
184 | 516747.058864 | 4.104977e+06 |
185 | 647291.961412 | 4.223991e+06 |
186 | 673236.038854 | 4.292326e+06 |
187 | 534897.641241 | 4.129232e+06 |
188 | 789507.980935 | 4.240825e+06 |
189 | 701276.258284 | 4.199411e+06 |
190 | 669424.422196 | 4.276723e+06 |
191 | 602477.348747 | 4.146360e+06 |
192 | 872333.052082 | 4.156737e+06 |
193 | 773967.535489 | 4.145192e+06 |
194 | 803387.940279 | 4.173642e+06 |
195 | 876485.065262 | 4.148120e+06 |
196 | 621600.111400 | 4.177462e+06 |
197 | 450246.610116 | 4.106031e+06 |
198 | 740919.375814 | 4.359605e+06 |
199 | 797522.610898 | 4.208606e+06 |
200 rows × 2 columns
pp_va.head()
x | y | |
---|---|---|
0 | 865322.486181 | 4.150317e+06 |
1 | 774479.213103 | 4.258993e+06 |
2 | 308048.692232 | 4.054700e+06 |
3 | 670711.529980 | 4.258864e+06 |
4 | 666254.475614 | 4.256514e+06 |
pp_va.tail()
x | y | |
---|---|---|
195 | 876485.065262 | 4.148120e+06 |
196 | 621600.111400 | 4.177462e+06 |
197 | 450246.610116 | 4.106031e+06 |
198 | 740919.375814 | 4.359605e+06 |
199 | 797522.610898 | 4.208606e+06 |
The intensity of a point process at point $s_i$ can be defined as:
$$\lambda(s_j) = \lim \limits_{|\mathbf{A}s_j| \to 0} \left \{ \frac{E(Y(\mathbf{A}s_j)}{|\mathbf{A}s_j|} \right \} $$where $\mathbf{A}s_j$ is a small region surrounding location $s_j$ with area $|\mathbf{A}s_j|$, and $E(Y(\mathbf{A}s_j)$ is the expected number of event points in $\mathbf{A}s_j$.
The intensity is the mean number of event points per unit of area at point $s_j$.
Recall that one of the implications of CSR is that the intensity of the point process is constant in our study area $\Re$. In other words $\lambda(s_j) = \lambda(s_{j+1}) = \ldots = \lambda(s_n) = \lambda \ \forall s_j \in \Re$. Thus, if the area of $\Re$ = $|\Re|$ the expected number of event points in the study region is: $E(Y(\Re)) = \lambda |\Re|.$
In PySAL, the intensity is estimated by using a geometric object to encode the study region. We refer to this as the window, $W$. The reason for distinguishing between $\Re$ and $W$ is that the latter permits alternative definitions of the bounding object.
Intensity estimates are based on the following: $$\hat{\lambda} = \frac{n}{|W|}$$
where $n$ is the number of points in the window $W$, and $|W|$ is the area of $W$.
Intensity based on minimum bounding box: $$\hat{\lambda}_{mbb} = \frac{n}{|W_{mbb}|}$$
where $W_{mbb}$ is the minimum bounding box for the point pattern.
pp_va.lambda_mbb
9.223156295311263e-10
Intensity based on convex hull: $$\hat{\lambda}_{hull} = \frac{n}{|W_{hull}|}$$
where $W_{hull}$ is the convex hull for the point pattern.
pp_va.lambda_hull
1.5973789098179388e-09
There is more to learn about point patterns in PySAL.
The centrographic notebook illustrates a number of spatial descriptive statistics and visualization of point patterns.
Clearly the window chosen will impact the intensity estimate. For more on windows see the window notebook.
To test if your point pattern departs from complete spatial randomness see the distance statistics notebook and quadrat statistics notebook.
To simulate different types of point processes in various windows see process notebook.
If you have point pattern data with additional attributes associated with each point see how to handle this in the marks notebook.