Descriptive Statistics¶

Numpy offers many statistical functions, but if you want to obtain several statistical variables from the same array, it' s necessary to process the data several times to calculate the various parameters. This example shows how to use the DescriptiveStatistics class to obtain several statistical variables with a single calculation. Also, the calculation algorithm is incremental and is more numerically stable.

Reference

Pébay, P., Terriberry, T.B., Kolla, H. et al. Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. Comput Stat 31, 1305–1325, 2016, https://doi.org/10.1007/s00180-015-0637-z

In [ ]:

import dask.array
import numpy
import pyinterp

Create a random array

In [ ]:

values = numpy.random.random_sample((2, 4, 6, 8))

Create a DescriptiveStatistics object.

In [ ]:

ds = pyinterp.DescriptiveStatistics(values)

The constructor will calculate the statistical variables on the provided data. The calculated variables are stored in the instance and can be accessed using different methods:

mean
var
std
skewness
kurtosis
min
max
sum
sum_of_weights
count

In [ ]:

ds.count()

In [ ]:

ds.mean()

It's possible to get a structured numpy array containing the different statistical variables calculated.

In [ ]:

ds.array()

Like numpy, it's possible to compute statistics along axis.

In [ ]:

ds = pyinterp.DescriptiveStatistics(values, axis=(1, 2))
ds.mean()

The class can also process a dask array. In this case, the call to the constructor triggers the calculation.

In [ ]:

ds = pyinterp.DescriptiveStatistics(
    dask.array.from_array(values, chunks=(2, 2, 2, 2)),
    axis=(1, 2))
ds.mean()

Finally, it's possible to calculate weighted statistics.

In [ ]:

weights = numpy.random.random_sample((2, 4, 6, 8))
ds = pyinterp.DescriptiveStatistics(values, weights=weights, axis=(1, 2))
ds.mean()