import matplotlib.pyplot as plt # for display purposes
import ruptures as rpt # our package
Let us generate a 3-dimensional piecewise constant signal with Gaussian noise.
n_samples, n_dims, sigma = 1000, 3, 2
n_bkps = 4 # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, n_dims, n_bkps, noise_std=sigma)
The true change points of this synthetic signal are available in the bkps
variable.
print(bkps)
Note that the first four element are change point indexes while the last is simply the number of samples.
(This is a technical convention so that functions in ruptures
always know the length of the signal at hand.)
It is also possible to plot our $\mathbb{R}^3$-valued signal along with the true change points with the rpt.display
function.
In the following image, the color changes whenever the mean of the signal shifts.
fig, ax_array = rpt.display(signal, bkps)
We can now perform change point detection, meaning that we find the indexes where the signal mean changes. To that end, we minimize the sum of squared errors when approximating the signal by a piecewise constant signal. Formally, for a signal $y_0,y_1,\dots,y_{T-1}$ ($T$ samples), we solve the following optimization problem, over all possible change positions $t_1<t_2<\dots<t_K$ (where the number $K$ of changes is defined by the user):
$$ \hat{t}_1, \hat{t}_2,\dots,\hat{t}_K = \arg\min_{t_1,\dots,t_K} V(t_1,t_2,\dots,t_K) $$with
$$ V(t_1,t_2,\dots,t_K) := \sum_{k=0}^K\sum_{t=t_k}^{t_{k+1}-1} \|y_t-\bar{y}_{t_k..t_{k+1}}\|^2 $$where $\bar{y}_{t_k..t_{k+1}}$ is the empirical mean of the sub-signal $y_{t_k}, y_{t_k+1},\dots,y_{t_{k+1}-1}$. (By convention $t_0=0$ and $t_{K+1}=T$.)
This optimization is solved with dynamic programming, using the Dynp
class. (More information in the section What is change point detection? and the User guide.)
# detection
algo = rpt.Dynp(model="l2").fit(signal)
result = algo.predict(n_bkps=4)
print(result)
Again the first elements are change point indexes and the last is the number of samples.
To visualy compare the true segmentation (bkps
) and the estimated one (result
), we can resort to rpt.display
a second time.
In the following image, the alternating colors indicate the true breakpoints and the dashed vertical lines, the estimated breakpoints.
# display
rpt.display(signal, bkps, result)
plt.show()
In this simple example, both are quite similar and almost undistinguishable.