Notebook

In [1]:

import numpy as np

from pycircstat2 import Circular, load_data

Descriptive Statistics¶

Table of Contents¶

circ_mean
circ_moment
circ_dispersion
circ_skewness
circ_kurtosis
circ_median
circ_std
circ_mean_ci and circ_median_ci

Circular mean¶

circ_mean_and_r(alpha, w) returns both the circular mean (μ) and the mean resultant length (r) together. μ and r can also be computed separately using circ_mean(alpha, w) and circ_r(alpha, w).

NOTE: r is a measure of concentration, and is also called the length of the mean vector. It has no units and vary from 0 to 1.

In [2]:

from pycircstat2.descriptive import circ_r, circ_mean, circ_mean_and_r

b11 = load_data("B11", source="fisher")["θ"].values[:]
c11 = Circular(data=b11)

u, r = circ_mean_and_r(alpha=c11.alpha)
print(f"μ={np.rad2deg(u).round(2)}, r={r.round(2)}")

μ=3.1, r=0.83

Circular moment¶

circ_moment(alpha, w) returns the pth trigonometric moment and the corresponding mean resultant vector (and some intermediates values if return_intermediates=True)

In [3]:

from pycircstat2.descriptive import circ_moment, convert_moment

# first moment == mean

mp1 = circ_moment(alpha=c11.alpha, p=1, centered=False)
u1, r1 = convert_moment(mp1)    
print(f"μ1={np.rad2deg(u1).round(2)}, r1={r1.round(2)}")

# second moment

mp2 = circ_moment(alpha=c11.alpha, p=2, centered=False)
u2, r2 = convert_moment(mp2)
print(f"μ2={np.rad2deg(u2).round(2)}, r2={r2.round(2)}")

μ1=3.1, r1=0.83
μ2=0.64, r2=0.67

Circular dispersion¶

circ_dispersion(alpha) computes the sample circular dispersion (eq 2.28, Fisher, 1993):

$$\text{circular dispersion} = \frac{1 - r_{2}}{2 r_{1}^{2}}$$

NOTE: In Fisher (1993), data were centered (by substrating the mean) before the computation of all summary statistics related to the second circular moments (such as dispersion, skewness and kurtosis), which might results in incorrect estimates (see issue #58 from pycircstat). Here, we follows Pewsey, et al. 2014, by setting the default as NOT centering the data.

In [4]:

from pycircstat2.descriptive import circ_dispersion

circ_dispersion(alpha=c11.alpha), (1 - r2)/ (2 * r1 **2)

Out[4]:

(np.float64(0.23986529263377035), np.float64(0.23986529263377035))

Circular skewness¶

circ_skewness(alpha) measures how symmetric the data distribution is:

$$\text{circular skewness} = \frac{r_{2}\cos(\mu_{2} - 2\mu_{1})}{(1 - r_{1})^{3/2}}$$

If the resulting value is near 0, then the data distribution is symmetric; if it's relatively large and negative, the data distribution is skewed counterclockwise away from the mean direction; if it's positive, then it's skewed clockwise.

In [5]:

from pycircstat2.descriptive import circ_skewness

circ_skewness(alpha=c11.alpha), (r2 * np.sin(u2 - 2 * u1)) / (1 - r1) ** 1.5

Out[5]:

(np.float64(-0.9235490965405332), np.float64(-0.9235490965405332))

Circular kurtosis¶

circ_kurtosis(alpha) measures the peakedness of the data distribution:

$$\text{circular kurtosis} = \frac{r_{2}\cos(\mu_{2} - 2\mu_{1}) - r_{1}^{4}}{(1 - r_{1})^{2}}$$

If it's close to 0, the data is near evenly distributed around the circle.

In [6]:

from pycircstat2.descriptive import circ_kurtosis

circ_kurtosis(alpha=c11.alpha), (r2 * np.cos(u2 - 2 * u1) - r**4) / (1 - r1) ** 2

Out[6]:

(np.float64(6.642665308562964), np.float64(6.642665308562964))

Circular median¶

circ_median(alpha, method)

The median for circular data is not well-defined compared to other summary statistics.

The simplest way to define the circular sample median is to find the axis that can divide the data into two equal halves. In pycircstat2, we call this method count because it's implemented as rotating an axis, and count the number of data point in both sides, then returns the axis with the smallest count difference. The axes can be just the angles in the data set, or, it can be predefined, for example, all axes seperated by 1 degree. We chose the first approach as it's more convenient.

Alternatively, one can define the median as the axis that has the minimum mean deviation (deviation; in some literature, it's called Mardia's median):

$$\text{mean deviation} = \pi - \frac{1}{n}\sum_{1}^{n}|\pi - |\theta_{i} - \theta||$$

Again, there are two approches: 1. we can choose the axis from existing data points, or 2. from a predefined grids.

For grouped data, we use the method of Mardia (1972).

Both count and deviation would potentially identify multiple axes that satisfy the condition. In those case, we followed Otieno & Anderson-Cook (2003) by simply taking the circular mean of ALL potential medians. But the user can also choose to return all potential medians instead.

In [7]:

from pycircstat2.descriptive import circ_median

# example with multiple ties 
d = np.deg2rad(np.array([0, 45, 45, 135,135, 180]))

# return average: (45 + 135) / 2 = 90
print(np.rad2deg(circ_median(alpha=d, return_average=True)))
# return all potential medians
print(np.rad2deg(circ_median(alpha=d, return_average=False)))

90.0
[ 45.  45. 135. 135.]

Angular and circular standard deviation¶

angular_std(alpha) returns the mean angular deviation (s; ranges from 0 to 81.03 degree):

$$s = \sqrt{2 ( 1 - r)}$$

and circ_std(alpha) returns the circular standard deviation (s0; ranges from 0 to inf):

$$s_{0} = \sqrt{ -2 \ln r}$$

The corresponding angular and circular variance are the sqaure of s and s0.

For grouped data, r will be corrected by the bin_size = np.diff(alpha)[0].

In [8]:

from pycircstat2.descriptive import angular_std, circ_std

d1 = load_data('D1', source='zar')['θ'].values[:]
c1 = Circular(data=d1)
s = angular_std(alpha=c1.alpha)
s0 = circ_std(alpha=c1.alpha)
print(f"s={np.rad2deg(s).round()}, s0={np.rad2deg(s0).round()}")

d2 = load_data('D2', source='zar')
c2 = Circular(data=d2['θ'].values[:], w=d2['w'].values[:])
s = angular_std(alpha=c2.alpha, w=c2.w)
s0 = circ_std(alpha=c2.alpha, w=c2.w)
print(f"s={np.rad2deg(s).round()}, s0={np.rad2deg(s0).round()}")

s=34.0, s0=36.0
s=54.0, s0=62.0

In [9]:

from pycircstat2.descriptive import compute_C_and_S, circ_var, circ_r
C, S = compute_C_and_S(alpha=c2.alpha, w=c2.w)
np.sqrt(C**2 + S**2), circ_r(alpha=c2.alpha, w=c2.w)

Out[9]:

(np.float64(0.550635071425303), np.float64(0.550635071425303))

In [10]:

circ_var(alpha=c2.alpha, w=c2.w)

Out[10]:

np.float64(0.44302427766556485)

Confidence interval for circular mean and circular median¶

circ_mean_ci(alpha, ci=0.95, method='approximate') returns the confidence levels of the circular mean. There are three methods:
- approximate (Section 26.7, Zar, 2010)
- bootstrap (Section 4.4.4a, Fisher, 1993)
- dispersion (Section 4.4.4b, Fisher, 1993)
circ_median_ci(alpha, ci=0.95) returns the confidence levels of the circular median. When n<16, the CI is retrieved according to table A6 of Fisher (1993)l whenn >= 16, the CI is approximated according to Section 4.4.2 b (Fisher, 1993).

In [11]:

from pycircstat2.descriptive import circ_mean_ci

data_zar_ex4_ch26 = load_data("D1", source="zar")
circ_zar_ex4_ch26 = Circular(data=data_zar_ex4_ch26["θ"].values[:])

## Approximate: Example 26.6, Zar, 2010
lb, ub = circ_mean_ci(
    mean=circ_zar_ex4_ch26.mean,
    r=circ_zar_ex4_ch26.r,
    n=circ_zar_ex4_ch26.n,
    method="approximate",
)
print(np.rad2deg([lb, ub]).round())

[ 68. 130.]

In [ ]:

from pycircstat2.descriptive import circ_median_ci

d_ex3 = load_data('B6','fisher')
d_ex3_10 = np.sort(d_ex3[d_ex3.set==2]['θ'].values[:10])
d_ex3_20 = np.sort(d_ex3[d_ex3.set==2]['θ'].values[:20])
d_ex3_30 = np.sort(d_ex3[d_ex3.set==2]['θ'].values[:])

table = {'median': [], 'mean': [], '95% median CI': [], '95% bootstrap mean CI': [], '95% large-sample mean CI': []}
for i, d in enumerate([d_ex3_10, d_ex3_20, d_ex3_30]):
    
    e = np.deg2rad(d)
    mean = circ_mean(e)
    table['mean'].append(np.rad2deg(mean).round(1))
    median = circ_median(e, method='deviation', average_method="unique")
    table['median'].append(np.rad2deg(median).round(1))

    # CI for median
    median_lb, median_ub, ci = circ_median_ci(alpha=e, method='deviation')
    table['95% median CI'].append(np.rad2deg([median_lb, median_ub]).round(1))

    # CI for mean using bootstrap
    mean_lb, mean_ub = circ_mean_ci(
        alpha=e,
        B=200,
        method="bootstrap",
    )
    table['95% bootstrap mean CI'].append(np.rad2deg([mean_lb, mean_ub]).round(1))

    if i == 2:
        # CI for mean using dispersion
        mean_lb_large_sample, mean_ub_large_sample = circ_mean_ci(
            alpha=e,
            method="dispersion",
        )
        table['95% large-sample mean CI'].append(np.rad2deg([mean_lb_large_sample, mean_ub_large_sample]).round(1))
    else:
        table['95% large-sample mean CI'].append('-')

print('Table 4.1 of Fisher (1993), p.78')
print(f"{'Sample size':25} {'10':25} {'20':25} {'30':25}")
print("-"*100)
for k, v in table.items():
    v10, v20, v30 = v
    print(f"{k:25} {str(v10):25} {str(v20):25} {str(v30):20}")

Table 4.1 of Fisher (1993), p.78
Sample size               10                        20                        30                       
----------------------------------------------------------------------------------------------------
median                    279.0                     247.5                     245.0               
mean                      280.8                     248.7                     247.6               
95% median CI             [245. 315.]               [229. 277.]               [229. 267.]         
95% bootstrap mean CI     [266.2 305.3]             [216.  266.5]             [230.2 262.1]       
95% large-sample mean CI  -                         -                         [232.7 262.5]

In [13]:

%load_ext watermark
%watermark --time --date --timezone --updated --python --iversions --watermark

Last updated: 2025-03-11 17:57:05CET

Python implementation: CPython
Python version       : 3.12.9
IPython version      : 8.31.0

pycircstat2: 0.1.12
numpy      : 2.2.3

Watermark: 2.5.0