import numpy as np
from pycircstat2 import Circular, load_data
Chapter 26 of Zar (2010) contains many examples and step-by-step guide of how to compute most of circular descriptive statistics. We replicated all those examples and figures in notebook B2-Zar-2010
with pycircstats2
.
circ_mean_and_r(alpha, w)
returns both the circular mean (μ
) and the mean resultant length (r
) together. μ
and r
can also be computed separately using circ_mean(alpha, w)
and circ_r(alpha, w)
.
r
is a measure of concentration, and is also called the length of the mean vector. It has no units and vary from 0 to 1.from pycircstat2.descriptive import circ_r, circ_mean, circ_mean_and_r
b11 = load_data("B11", source="fisher")["θ"].values
c11 = Circular(data=b11)
u, r = circ_mean_and_r(alpha=c11.alpha)
print(f"μ={np.rad2deg(u).round(2)}, r={r.round(2)}")
μ=3.1, r=0.83
circ_moment(alpha, w)
returns the p
th trigonometric moment and the corresponding mean resultant vector (and some intermediates values if return_intermediates=True
)
from pycircstat2.descriptive import circ_moment, convert_moment
# first moment == mean
mp1 = circ_moment(alpha=c11.alpha, p=1, centered=False)
u1, r1 = convert_moment(mp1)
print(f"μ1={np.rad2deg(u1).round(2)}, r1={r1.round(2)}")
# second moment
mp2 = circ_moment(alpha=c11.alpha, p=2, centered=False)
u2, r2 = convert_moment(mp2)
print(f"μ2={np.rad2deg(u2).round(2)}, r2={r2.round(2)}")
μ1=3.1, r1=0.83 μ2=0.64, r2=0.67
circ_dispersion(alpha)
computes the sample circular dispersion (eq 2.28, Fisher, 1993):
pycircstat
). Here, we follows Pewsey, et al. 2014, by setting the default as NOT centering the data.from pycircstat2.descriptive import circ_dispersion
circ_dispersion(alpha=c11.alpha), (1 - r2)/ (2 * r1 **2)
(np.float64(0.23986529263377035), np.float64(0.23986529263377035))
circ_skewness(alpha)
measures how symmetric the data distribution is:
If the resulting value is near 0, then the data distribution is symmetric; if it's relatively large and negative, the data distribution is skewed counterclockwise away from the mean direction; if it's positive, then it's skewed clockwise.
from pycircstat2.descriptive import circ_skewness
circ_skewness(alpha=c11.alpha), (r2 * np.sin(u2 - 2 * u1)) / (1 - r1) ** 1.5
(np.float64(-0.9235490965405332), np.float64(-0.9235490965405332))
circ_kurtosis(alpha)
measures the peakedness of the data distribution:
If it's close to 0, the data is near evenly distributed around the circle.
from pycircstat2.descriptive import circ_kurtosis
circ_kurtosis(alpha=c11.alpha), (r2 * np.cos(u2 - 2 * u1) - r**4) / (1 - r1) ** 2
(np.float64(6.642665308562964), np.float64(6.642665308562964))
circ_median(alpha, method)
The median for circular data is not well-defined compared to other summary statistics.
The simplest way to define the circular sample median is to find the axis that can divide the data into two equal halves. In pycircstat2
, we call this method count
because it's implemented as rotating an axis, and count the number of data point in both sides, then returns the axis with the smallest count difference. The axes can be just the angles in the data set, or, it can be predefined, for example, all axes seperated by 1 degree. We chose the first approach as it's more convenient.
Alternatively, one can define the median as the axis that has the minimum mean deviation (deviation
; in some literature, it's called Mardia's median):
Again, there are two approches: 1. we can choose the axis from existing data points, or 2. from a predefined grids.
For grouped data, we use the method of Mardia (1972).
Both count
and deviation
would potentially identify multiple axes that satisfy the condition. In those case, we followed Otieno & Anderson-Cook (2003) by simply taking the circular mean of ALL potential medians. But the user can also choose to return all potential medians instead.
from pycircstat2.descriptive import circ_median
# example with multiple ties
d = np.deg2rad(np.array([0, 45, 45, 135,135, 180]))
# return average: (45 + 135) / 2 = 90
print(np.rad2deg(circ_median(alpha=d, return_average=True)))
# return all potential medians
print(np.rad2deg(circ_median(alpha=d, return_average=False)))
90.0 [ 45. 45. 135. 135.]
angular_std(alpha)
returns the mean angular deviation (s
; ranges from 0
to 81.03
degree):
and circ_std(alpha)
returns the circular standard deviation (s0
; ranges from 0
to inf
):
The corresponding angular and circular variance are the sqaure of s
and s0
.
For grouped data, r
will be corrected by the bin_size = np.diff(alpha)[0]
.
from pycircstat2.descriptive import angular_std, circ_std
d1 = load_data('D1', source='zar')['θ'].values
c1 = Circular(data=d1)
s = angular_std(alpha=c1.alpha)
s0 = circ_std(alpha=c1.alpha)
print(f"s={np.rad2deg(s).round()}, s0={np.rad2deg(s0).round()}")
d2 = load_data('D2', source='zar')
c2 = Circular(data=d2['θ'].values, w=d2['w'].values)
s = angular_std(alpha=c2.alpha, w=c2.w)
s0 = circ_std(alpha=c2.alpha, w=c2.w)
print(f"s={np.rad2deg(s).round()}, s0={np.rad2deg(s0).round()}")
s=34.0, s0=36.0 s=54.0, s0=62.0
from pycircstat2.descriptive import compute_C_and_S, circ_var, circ_r
C, S = compute_C_and_S(alpha=c2.alpha, w=c2.w)
np.sqrt(C**2 + S**2), circ_r(alpha=c2.alpha, w=c2.w)
(np.float64(0.550635071425303), np.float64(0.550635071425303))
circ_var(alpha=c2.alpha, w=c2.w)
np.float64(0.44302427766556485)
circ_mean_ci(alpha, ci=0.95, method='approximate')
returns the confidence levels of the circular mean. There are three methods:
approximate
(Section 26.7, Zar, 2010)bootstrap
(Section 4.4.4a, Fisher, 1993)dispersion
(Section 4.4.4b, Fisher, 1993)circ_median_ci(alpha, ci=0.95)
returns the confidence levels of the circular median. When n<16
, the CI is retrieved according to table A6 of Fisher (1993)l whenn >= 16
, the CI is approximated according to Section 4.4.2 b (Fisher, 1993).
from pycircstat2.descriptive import circ_mean_ci
data_zar_ex4_ch26 = load_data("D1", source="zar")
circ_zar_ex4_ch26 = Circular(data=data_zar_ex4_ch26["θ"].values)
## Approximate: Example 26.6, Zar, 2010
lb, ub = circ_mean_ci(
mean=circ_zar_ex4_ch26.mean,
r=circ_zar_ex4_ch26.r,
n=circ_zar_ex4_ch26.n,
method="approximate",
)
print(np.rad2deg([lb, ub]).round())
[ 68. 130.]
from pycircstat2.descriptive import circ_median_ci
d_ex3 = load_data('B6','fisher')
d_ex3_10 = np.sort(d_ex3[d_ex3.set==2]['θ'].values[:10])
d_ex3_20 = np.sort(d_ex3[d_ex3.set==2]['θ'].values[:20])
d_ex3_30 = np.sort(d_ex3[d_ex3.set==2]['θ'].values)
table = {'median': [], 'mean': [], '95% median CI': [], '95% bootstrap mean CI': [], '95% large-sample mean CI': []}
for i, d in enumerate([d_ex3_10, d_ex3_20, d_ex3_30]):
e = np.deg2rad(d)
mean = circ_mean(e)
table['mean'].append(np.rad2deg(mean).round(1))
median = circ_median(e, method='deviation', average_method="unique")
table['median'].append(np.rad2deg(median).round(1))
# CI for median
median_lb, median_ub, ci = circ_median_ci(alpha=e, method='deviation')
table['95% median CI'].append(np.rad2deg([median_lb, median_ub]).round(1))
# CI for mean using bootstrap
mean_lb, mean_ub = circ_mean_ci(
alpha=e,
B=200,
method="bootstrap",
)
table['95% bootstrap mean CI'].append(np.rad2deg([mean_lb, mean_ub]).round(1))
if i == 2:
# CI for mean using dispersion
mean_lb_large_sample, mean_ub_large_sample = circ_mean_ci(
alpha=e,
method="dispersion",
)
table['95% large-sample mean CI'].append(np.rad2deg([mean_lb_large_sample, mean_ub_large_sample]).round(1))
else:
table['95% large-sample mean CI'].append('-')
print('Table 4.1 of Fisher (1993), p.78')
print(f"{'Sample size':25} {'10':25} {'20':25} {'30':25}")
print("-"*100)
for k, v in table.items():
v10, v20, v30 = v
print(f"{k:25} {str(v10):25} {str(v20):25} {str(v30):20}")
Table 4.1 of Fisher (1993), p.78 Sample size 10 20 30 ---------------------------------------------------------------------------------------------------- median 279.0 247.5 245.0 mean 280.8 248.7 247.6 95% median CI [245. 315.] [229. 277.] [229. 267.] 95% bootstrap mean CI [265.1 298.5] [228.1 269.4] [229.9 262. ] 95% large-sample mean CI - - [232.7 262.5]
%load_ext watermark
%watermark --time --date --timezone --updated --python --iversions --watermark -p pycircstat2
Last updated: 2024-11-28 14:31:06CET Python implementation: CPython Python version : 3.10.13 IPython version : 8.29.0 pycircstat2: 0.1.2 numpy : 2.1.3 pycircstat2: 0.1.2 Watermark: 2.5.0