Lead author: Nina Miolane.

In [3]:

```
import os
import sys
import warnings
sys.path.append(os.path.dirname(os.getcwd()))
warnings.filterwarnings("ignore")
```

In [4]:

```
import geomstats.backend as gs
gs.random.seed(2020)
```

INFO: Using numpy backend

Finally, we import the visualization module.

In [5]:

```
import matplotlib
import matplotlib.colors as colors
import matplotlib.image as mpimg
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import geomstats.visualization as visualization
visualization.tutorial_matplotlib()
```

The **science of Statistics** is defined as the collection of data, their analysis and interpretation. Statistical theory is usually defined for data belonging to vector spaces, which are *linear spaces*. For example, we know how to compute the mean of a data set of numbers, like the mean of students' weights in a classroom, or of multidimensional arrays, like the average 3D velocity vector of blood cells in a vessel.

Here is an example of the computation of the mean of two arrays of dimension 2.

In [6]:

```
from geomstats.geometry.euclidean import Euclidean
dim = 2
n_samples = 2
euclidean = Euclidean(dim=dim)
points_in_linear_space = euclidean.random_point(n_samples=n_samples)
print("Points in linear space:\n", points_in_linear_space)
linear_mean = gs.sum(points_in_linear_space, axis=0) / n_samples
print("Mean of points:\n", linear_mean)
```

We plot the points and their mean on the 2D Euclidean space, which is a linear space: a plane.

In [7]:

```
%matplotlib inline
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111)
ax.scatter(points_in_linear_space[:, 0], points_in_linear_space[:, 1], label="Points")
ax.plot(points_in_linear_space[:, 0], points_in_linear_space[:, 1], linestyle="dashed")
ax.scatter(
gs.to_numpy(linear_mean[0]),
gs.to_numpy(linear_mean[1]),
label="Mean",
s=80,
alpha=0.5,
)
ax.set_title("Mean of points in a linear space")
ax.legend();
```

What happens to the usual statistical theory when the data doesn't naturally belong to a linear space. For example, if we want to perform statistics on the coordinates of world cities, which lie on the earth: a sphere?

The non-linear spaces we consider are called manifolds. A manifold $M$ of dimension $m$ is a space that is allowed to be curved but that looks like an $m$-dimensional vector space in the neighborhood of every point.

A sphere, like the earth, is a good example of a manifold. We know that the earth is curved, but at our scale we do not see its curvature. Can we still use linear statistics when data are defined on these manifolds, or shall we?

Let's try.

In [8]:

```
from geomstats.geometry.hypersphere import Hypersphere
sphere = Hypersphere(dim=dim)
points_in_manifold = sphere.random_uniform(n_samples=n_samples)
print("Points in manifold:\n", points_in_manifold)
linear_mean = gs.sum(points_in_manifold, axis=0) / n_samples
print("Mean of points:\n", linear_mean)
```

We plot the points and their mean computed with the linear formula.

In [9]:

```
%matplotlib inline
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection="3d")
visualization.plot(points_in_manifold, ax=ax, space="S2", label="Point", s=80)
ax.plot(
points_in_manifold[:, 0],
points_in_manifold[:, 1],
points_in_manifold[:, 2],
linestyle="dashed",
alpha=0.5,
)
ax.scatter(
linear_mean[0], linear_mean[1], linear_mean[2], label="Mean", s=80, alpha=0.5
)
ax.set_title("Mean of points on a manifold")
ax.legend();
```

In [10]:

```
print(sphere.belongs(linear_mean))
```

False

For this reason, researchers aim to build a theory of statistics that is by construction compatible with any structure we equip the manifold with. This theory is called *Geometric Statistics*.

**Geometric Statistics** is a theory of statistics on manifolds, that takes into account their geometric structures. Geometric Statistics is therefore the child of two major pillars of Mathematics: Geometry and Statistics.

Why should we bother to build a whole new theory of statistics? Do we really have data that belong to spaces like the sphere illustrated in introduction?

Let's see some examples of data spaces that are naturally manifolds. By doing so, we will introduce the `datasets`

and `visualization`

modules of `geomstats`

.

We first import the `datasets.utils`

module that allows loading datasets.

In [11]:

```
import geomstats.datasets.utils as data_utils
```

We load the dataset `cities`

, that contains the coordinates of world cities in spherical coordinates.

In [12]:

```
data, names = data_utils.load_cities()
print(names[:5])
print(data[:5])
```

In [13]:

```
gs.all(sphere.belongs(data))
```

Out[13]:

True

In [14]:

```
data, names = data_utils.load_cities()
```

In [15]:

```
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection="3d")
visualization.plot(data[15:20], ax=ax, space="S2", label=names[15:20], s=80, alpha=0.5)
ax.set_title("Cities on the earth.");
```

We consider the dataset `poses`

, that contains the 3D poses of objects in images. Specifically, we consider poses of beds in images, i.e. the 3D orientation of each bed within a given 2D image.

The orientation corresponds to a 3D rotation. A 3D rotation $R$ is visually represented as the result of $R$ applied to the coordinate frame $(e_x, e_y, e_z)$.

We first load the dataset.

In [16]:

```
data, img_paths = data_utils.load_poses()
img_path1, img_path2 = img_paths[0], img_paths[1]
img_path1 = os.path.join(data_utils.DATA_PATH, "poses", img_path1)
img_path2 = os.path.join(data_utils.DATA_PATH, "poses", img_path2)
img1 = mpimg.imread(img_path1)
img2 = mpimg.imread(img_path2)
fig = plt.figure(figsize=(16, 8))
ax = fig.add_subplot(121)
imgplot = ax.imshow(img1)
ax.axis("off")
ax = fig.add_subplot(122)
imgplot = ax.imshow(img2)
ax.axis("off")
plt.show()
```