$\textbf{Lead Author: Anna Calissano}$
Dear learner,
the aim of the current notebook is to introduce the align all and compute as a learning method for graphs. The align all and compute allows to estimate the Frechet Mean, the Generalized Geodesic Principal Components and the Regression. In this notebook you will learn how use all the learning methods.
import warnings
import random
import networkx as nx
import geomstats.backend as gs
from geomstats.geometry.symmetric_matrices import MatricesMetric as SymmetricMatricesMetric
from geomstats.geometry.stratified.graph_space import (
GraphSpace,
GraphSpaceMetric,
)
from geomstats.learning.aac import AAC
warnings.filterwarnings("ignore")
gs.random.seed(2020)
INFO: Using numpy backend
Let's start by creating simulated data using networkx
.
graphset_1 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5, p=0.6, directed=True)) for i in range(10)])
graphset_2 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5, p=0.6, directed=True)) for i in range(100)])
graphset_3 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=3, p=0.6, directed=True)) for i in range(1000)])
The first step is to create the embedding space and the corresponding metric.
graph_space = GraphSpace(n_nodes=5)
graph_space_metric=GraphSpaceMetric(space=graph_space)
By default, the space comes with a total space (Matrices
), which in turn comes equipped with a matric (MatricesMetric
).
graph_space.total_space.metric
<geomstats.geometry.matrices.MatricesMetric at 0x1f169267250>
(total_metric
can also be accessed from metric: graph_space_metric.total_space_metric
.)
The default aligner is 'ID' (identity), which means the graphs are not permuted. To set 'FAQ', do:
graph_space_metric.set_aligner('FAQ')
<geomstats.geometry.stratified.graph_space.FAQAligner at 0x1f16922aca0>
With the FAQ alignment and the default Frobenious norm, we match two graphs and a set of graphs to a base graph:
graph_permuted = graph_space_metric.align_point_to_point(base_graph=graphset_1[0], graph_to_permute=graphset_1[1])
graph_space_metric.align_point_to_point(base_graph= graphset_1[0], graph_to_permute =graphset_1[1:3])
array([[[0., 1., 1., 1., 1.], [0., 0., 0., 1., 1.], [1., 1., 0., 1., 1.], [0., 1., 1., 0., 1.], [1., 1., 0., 0., 0.]], [[0., 0., 1., 1., 1.], [0., 0., 1., 0., 1.], [1., 1., 0., 1., 0.], [0., 1., 0., 0., 0.], [0., 1., 1., 0., 0.]]])
To compute the distance we can either call the distance function:
graph_space_metric.dist(graphset_1[0], graphset_1[1])
2.449489742783178
Or, if the matching has been already run, we can use the identity matcher in the distance, to avoid computing the matching twice:
graph_space_metric.set_aligner('ID')
graph_space_metric.dist(graphset_1[0], graph_permuted)
2.449489742783178
Alternatively, use can use the total space metric instead.
graph_space_metric.total_space_metric.dist(graphset_1[0], graph_permuted)
2.449489742783178
We can change the total space metric by doing:
graph_space_metric.total_space_metric = SymmetricMatricesMetric(n=5, m=5)
Or:
graph_space.total_space.metric = SymmetricMatricesMetric(n=5, m=5)
For the point to geodesic aligner, there's no default set. In fact, if you try something like graph_space_metric.align_point_to_geodesic(geodesic, point)
, an (hopefully) meaningful error will be raised, explaining how to set the point to geodesic aligner.
graph_space_metric.set_point_to_geodesic_aligner("default", s_min=-1., s_max=1., n_points=10)
<geomstats.geometry.stratified.graph_space.PointToGeodesicAligner at 0x1f16937c970>
init_point, end_point = graph_space.random_point(2)
geodesic = graph_space_metric.geodesic(init_point, end_point)
aligned_init_point = graph_space_metric.align_point_to_geodesic(geodesic, init_point)
graph_space_metric.total_space_metric.dist(init_point, aligned_init_point)
0.0
This short introduction should be enough to set you up for experimenting with the learning algorithms on graphs.
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.
Given $\{[X_1], \dots, [X_k]\}, [x_i] \in X/T$, we estimate the Frechet Mean using AAC consisting on two steps:
Let's instantiate the graph space and the metric, and set the aligner.
graph_space = GraphSpace(n_nodes=5)
graph_space_metric = GraphSpaceMetric(space=graph_space)
graph_space_metric.set_aligner('FAQ')
<geomstats.geometry.stratified.graph_space.FAQAligner at 0x1f16937c9a0>
And now create the estimator, and fit the data.
aac_fm = AAC(estimate='frechet_mean', metric=graph_space_metric)
fm = aac_fm.fit(graphset_2)
fm.estimate_
WARNING: Maximum number of iterations 20 reached. The estimate may be inaccurate
array([[0. , 0.98, 0.66, 0.75, 0.77], [0.58, 0. , 0.27, 0.21, 0.63], [0.85, 0.71, 0. , 0.73, 0.61], [0.69, 0.79, 0.08, 0. , 0.19], [0.75, 0.86, 0.27, 0.54, 0. ]])
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.
We estimate the Generalized Geodesics Principal Components Analysis (GGPCA) using AAC. Given $\{[X_1], \dots, [X_k]\}, (s_i,[X_i]) \in X/T $ we are searching for: $$\gamma: \mathbb{R}\rightarrow X/T$$ generalized geodesic principal component capturing the majority of the variability of the dataset. The AAC for ggpca works in two steps:
As before:
graph_space = GraphSpace(n_nodes=5)
graph_space_metric = GraphSpaceMetric(space=graph_space)
graph_space_metric.set_aligner('FAQ')
<geomstats.geometry.stratified.graph_space.FAQAligner at 0x1f169550040>
For GGPCA, we also need to set the pont to geodesic aligner.
graph_space_metric.set_point_to_geodesic_aligner('default', s_min=0, s_max=2)
<geomstats.geometry.stratified.graph_space.PointToGeodesicAligner at 0x1f16937c850>
Again, create the estimator and fit the data.
aac_ggpca = AAC(estimate='ggpca', metric=graph_space_metric, n_components=2)
aac_ggpca.fit(graphset_3)
_AACGGPCA(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F169540610>)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
_AACGGPCA(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F169540610>)
Reference: Calissano, A., Feragen, A., & Vantini, S. (2022). Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graph space. Journal of Multivariate Analysis, 190, 104950.
We estimate a graph-to-value regression model to predict graph from scalar or vectors. Given $\{(s_1,[X_1]), \dots, (s_k, [X_k])\}, (s_i,[X_i]) \in \mathbb{R}^p\times X/T $ we are searching for: $$f: \mathbb{R}^p\rightarrow X/T$$ where $f\in \mathcal{F}(X/T)$ is a generalized geodesic regression model, i.e., the canonical projection onto Graph Space of a regression line $h_\beta : \mathbb{R}^p\rightarrow X$ of the form $$h_\beta(s) = \sum_{j=1}^{p} \beta_i s_i$$ The AAC algorithm for regression combines the estimation of $h_\beta$ given $\{X_1, \dots, X_k\}, X_i \in X$ $$\sum_{i=0}^{k} d_X(h_\beta(s_i), X_i)$$ and the searching for $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with respect to the prediction along the current regression model: $$\min_{t\in T}d_X(h_\beta(s_i),t^TX_it)$$
graph_space = GraphSpace(n_nodes=5)
graph_space_metric = GraphSpaceMetric(space=graph_space)
graph_space_metric.set_aligner('FAQ')
<geomstats.geometry.stratified.graph_space.FAQAligner at 0x1f167deba60>
s = gs.array([random.randint(0,10) for i in range(10)])
aac_reg = AAC(estimate='regression', metric=graph_space_metric)
aac_reg.fit(s, graphset_1)
WARNING: Maximum number of iterations 20 reached. The estimate may be inaccurate
_AACRegressor(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F149216DF0>, total_space_estimator_kwargs={})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
_AACRegressor(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F149216DF0>, total_space_estimator_kwargs={})
The coefficients are saved in the following attributes and they can be changed into a graph shape.
aac_reg.total_space_estimator.coef_
array([[ 0. ], [-0.05207226], [ 0. ], [-0.0031881 ], [ 0.02869288], [ 0.04675877], [ 0. ], [ 0.05632306], [-0.03081828], [-0.09351753], [ 0. ], [ 0.02231668], [ 0. ], [ 0.00743889], [-0.07332625], [ 0.02763018], [ 0. ], [-0.06801275], [ 0. ], [ 0.07013815], [-0.07013815], [-0.01168969], [ 0.04463337], [ 0.05419766], [ 0. ]])
A graph can be predicted using the fit model and the corresponding prediction error can be computed:
graph_pred = aac_reg.total_space_estimator.predict(s)
gs.sum(graph_space_metric.dist(graphset_1, graph_pred))
16.746889471809105