TriScale Demo¶

This notebook contains a short demonstration of TriScale API, how the different functions are meant to be used, and the visualizations they produce.

For more details about TriScale, you may refer to the paper.

List of Imports
Experiment Sizing
Runs and Metrics
Series and KPIs
Sequels and Variability Scores
Network Profiling

List of Imports¶

In [1]:

import os
from pathlib import Path

import pandas as pd
import numpy as np

import triscale

Experiment Sizing¶

Back to top

During the design phase of an experiment, one important question to answer is "how many time should the experiments be performed?" This question directly relates to the definition of TriScale KPIs and variability scores.

TriScale implements a statistical method that allows to estimate, based on a data sample, any percentile of the underlying distribution with any level of confidence. Importantly, the estimation does not rely on any assumption on the nature of the underlying distribution (eg normal, or Poisson). The estimate is valid as long as the sample is independent and identically distributed (or iid ).

Intuitively, it is "easier" to estimate the median (50th percentile) than the 99th percentile; the more extreme the percentile, the more samples are required to provide an estimate for a given level of confidence. More precisely, the minimal number of sample $N$ required to estimate a percentile $0<p<1$ with confidence $0<C<1$ is given by:

$$N \;\geq\; \frac{log(1-C)}{log(1-p)}$$

TriScale experiment_sizing() function implements this computation and retuens the minimal number of samples $N$, as illustrated below.

In [2]:

# Select the percentile we want to estimate 
percentile = 10

# Select the desired level of confidence for the estimation
confidence = 99 # in %

# Compute the minimal number of samples N required
triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

A one-sided bound of the 	10-th percentile
with a confidence level of	99 % 
requires a minimum of 		44 samples

Let us consider the samples $x$ are ordered such that $x_1 \leq x_2 \ldots \leq x_N$. The previous result indicates that for $N = 44$ samples and above, $x_1$ is a lower bound for the 10th percentile with probibility larger than 99%.

To get a better feeling of how this minimal number of samples evolves this increasing confidence and more extreme percentiles, let us compute a range of minimal number of samples and display the results in a table (where the columns are the percentiles to estimate).

In [3]:

percentiles = [0.1, 1, 5, 10, 25, 50, 75, 90, 95, 99, 99.9]
confidences = [75, 90, 95, 99, 99.9, 99.99]
min_number_samples = []

for c in confidences:
    tmp = []
    for p in percentiles:
        N = triscale.experiment_sizing(p,c)
        tmp.append(N[0])
    min_number_samples.append(tmp)
    
df = pd.DataFrame(columns=percentiles, data=min_number_samples)
df['Confidence level'] = confidences
df.set_index('Confidence level', inplace=True)

display(df)

	0.1	1.0	5.0	10.0	25.0	50.0	75.0	90.0	95.0	99.0	99.9
Confidence level
75.00	1386	138	28	14	5	2	5	14	28	138	1386
90.00	2302	230	45	22	9	4	9	22	45	230	2302
95.00	2995	299	59	29	11	5	11	29	59	299	2995
99.00	4603	459	90	44	17	7	17	44	90	459	4603
99.90	6905	688	135	66	25	10	25	66	135	688	6905
99.99	9206	917	180	88	33	14	33	88	180	917	9206

Similarly, one can compute the minimal number $N$ such that any sample $x_m$ is an estimate (instead of $x_1$). This can be obtained from the experiment_sizing() function using the option robustness argument.

In [4]:

triscale.experiment_sizing(
    percentile, 
    confidence,
    robustness=3,
    verbose=True); 

A one-sided bound of the 	10-th percentile
with a confidence level of	99 % 
requires a minimum of 		97 samples
with the worst 			3 run(s) excluded

The previous result indicates that for $N = 97$ samples and above, $x_4$ is a lower bound for the 10th percentile with probibility larger than 99%.

Runs and Metrics¶

Back to top

Metrics in TriScale evaluate a performance dimension across a run. The computation of metrics is implmented in the analysis_metric() functions, which takes two compulsory arguments:

the raw data,
the metric definition,

The raw data can be passed as a file name (ie, a string) or as a Pandas dataframe.

If a string is passed, the function tries to read the file name as a csv file (comma separated) with x data in the first column and y data in the second column.
If a pandas DataFrame is passed, data must contain columns named x and y.

The metric definition is provided as a dictionary, with only the measure key being compulsory. This defines "what is the computation to be performed" on the data. The measure can be any percentile ($0<P<100$) or mean, minimum, maximum.

In [5]:

# Input data file (one-way delay of a full-throttled flow using TPC BBR)
data = 'ExampleData/raw_data.csv'

# Definition of a TriScale metric
metric = {  
    'measure': 50,   # Integer: interpreted as a percentile
    'unit'   : 'ms', # For display only
         }

has_converged, metric_measure, plot = triscale.analysis_metric( 
    data,
    metric)

print('Run metric: %0.2f %s' % (metric_measure, metric['unit']))

Run metric: 66.10 ms

This computation per se is not very interesting. The main value of the analysis_metric() function is when the metric attempts to estimate the long-term performance; that is, the value one would obtain shall the run last longer/more data points be collected.

This is performed by passing the optional convergence parameter; the function then integrates the TriScale convergence test, described in details in the paper. Passing plot=True triggers the plotting of the raw data, the metric data, and the convergence test data.

In [6]:

# Definition of a TriScale metric
metric = {  
    'measure': 50,   # Integer: interpreted as a percentile
    'unit'   : 'ms', # For display only
         }

# Parameters for the convergence test
convergence = {'expected': True}

has_converged, metric_measure, plot = triscale.analysis_metric( 
    data,
    metric,
    convergence=convergence,
    plot=True,
)
if has_converged:
    print('The metric data has converged.')
    print('Run metric: %0.2f %s' % (metric_measure, metric['unit']))
else:
    print('The metric data has **not** converged.')

The metric data has converged.
Run metric: 66.78 ms

Let us zoom-in on the Y-axis to see the data from the convergence test better...

In [7]:

plot.update_layout(yaxis_range=[64,68])
plot.show()

As detailed in the paper, the Metric data are computed over a sliding window of the raw data points (Data). TriScale convergence test constists in performing a linear regression (Slope). TriScale defines that a run has converged when the slope of the linear regression is "sufficiently close" to 0. TriScale formalizes "sufficiently close" as follows: the confidence interval for the slope must fall within the tolerance values. The confidence interval on the slope is computed using bootstrapping. TriScale convergence tests uses default values of 95% confidence level and 5% tolerence. This defaults can be overwritten by the user as shown below.

In [8]:

# Definition of a TriScale metric
metric = {  
    'measure': 50,   # Integer: interpreted as a percentile
    'unit'   : 'ms', # For display only
         }

# Parameters for the convergence test
convergence = {
    'expected'  : True,
    'confidence': 75,
    'tolerance' : 1
}

# Customized plot layout
layout = dict(yaxis=dict(range=[64,68]))

has_converged, metric_measure, plot = triscale.analysis_metric( 
    data,
    metric,
    convergence=convergence,
    plot=True,
    custom_layout=layout
)
if has_converged:
    print('The metric data has converged.')
    print('Run metric: %0.2f %s' % (metric_measure, metric['unit']))
else:
    print('The metric data has **not** converged.')

The metric data has converged.
Run metric: 66.78 ms

One can observe that the tolerance interval got slimer (1% instead of 5%). Similarly, the confidence interval on the slope got slimmer too, as we decrease the confidence level from 95% to 75% (that is, the shadded area is expected to contain the true slope value with 75% probability).

Series and KPIs¶

Back to top

TriScale ’s key performance indicators (KPIs) evaluate performance dimensions across a series of runs. Performing multiple runs allows to mitigate the inherent variability of the experimental conditions. KPIs capture this variability by estimating percentiles of the (unknown) metric distributions. Concretely, a TriScale KPI is a one-sided confidence interval of a percentile, e.g., a lower bound for the 25th percentile of a throughput metric, estimated with a 95% confidence level.

The computation of KPIs is implmented in the analysis_kpi() function, which takes two compulsory arguments:

the metric data
the KPI definition

The metric data can be passed as a list or an NumPy array. The KPI definition is provided as a dictionary with three compulsory keys: percentile ($0<P<100$), confidence ($0<C<100$), and bounds. The KPI bounds are the expected extremal values of the metric; this is necessary to perform the independance test (see below).

If the metrics bounds are unknown, simply pass the minimum and maximum metric values as bounds.

The analysis_kpi() function performs two computations:

It performs an empirical independence test; that is, the function tests whether the metric data appears to be iid.
It computes the KPI value for the metric data.

The metric data must be iid for the KPI to be a valid estimate of the underlying metric distribution. In general, independence is a property of the data collection process. However, in many preactical cases for networking experiment, independence cannot be guaranteed (for example, because there are correlations between the interference conditions between sucessive experiments). In such a case, one can perform an empirical test for independence; essentially, this assesses whether the level of correlation in the data appears sufficiently low such that the data can be assumed iid.

In [9]:

# Load sample metrics data (failure recovery time, in seconds)
data = 'ExampleData/metric_data.csv'
df = pd.read_csv(data, header=0, names=['metric'])

# Minimal KPI definition
KPI = {
    'percentile': 75,
    'confidence': 95,
    'bounds': [0,10],
    'unit': 's'
}

# Computes the KPI
indep_test_passed, KPI_value = triscale.analysis_kpi(
    df.metric.values,
    KPI,
)

# Output
if indep_test_passed:
    print('The metric data appears iid.')
    print('KPI value: %0.2f %s' % (KPI_value, KPI['unit']))
else:
    print('The metric data does not appear iid.')

The metric data appears iid.
KPI value: 1.92 s

Since the metric data appears to be iid, we can interpret the KPI value as follows

The 25th percentile on the metric is larger or equal to 84%, with a confidence level of 95%.

If the independence test fails, the KPI value is computed and returned nonetheless. However, the user must be aware that the resulting KPI is then not a trustworthy estimate of the corresponding percentile.

Optionally, the analysis_kpi() functions plots

the metric data series (series)
the autocorrelation plot (autocorr)
the metric data and the corresponding KPI value (horizontal)

In [10]:

# Computes the KPI and plot
indep_test_passed, KPI_value = triscale.analysis_kpi(
    df.metric.values,
    KPI,
    to_plot=['series','autocorr','horizontal']
)

Sequels and Variability Scores¶

Back to top

Sequels are repetitions of series of runs. TriScale ’s variability score evaluates the variations of KPI values across sequels. Sequels enable TriScale to detect long-term variations of KPIs and ultimately quantify the reproducibility of an experiment. Concretely, a variability score is a two-sided confidence interval, i.e., an estimation of a symmetric pair of percentiles. For example, a 75% confidence interval for the 25-75th percentiles. The underlying computations are the same as for the KPIs values.

The computation of variability scores is implmented in the analysis_variability() function, which takes two compulsory arguments:

the KPI data
the variability score definition

The KPI data can be passed as a list or an NumPy array. The variability score definition is provided as a dictionary with three compulsory keys: percentile ($0<P<100$), confidence ($0<C<100$), and bounds. The bounds are the expected extremal values of the KPI; this is necessary to perform the independance test (see below).

If the KPI bounds are unknown, simply pass the minimum and maximum KPI values as bounds.

Like in analysis_kpi(), analysis_variability() performs both the empirical independence test and the computation of the variability score. The same plotting options are available.

In [11]:

# Load sample KPI data (failure recovery time, in seconds)
data = 'ExampleData/kpi_data.csv'
df = pd.read_csv(data, header=0, names=['kpi'])

# Minimal KPI definition
score = {
    'percentile': 25, # the 25th-75th  percentiles range
    'confidence': 95,
    'bounds': [0,10],
    'unit': 's'
}

# Computes the KPI
(indep_test_passed, 
 upper_bound, 
 lower_bound, 
 var_score, 
 rel_score) = triscale.analysis_variability(
    df.kpi.values,
    score,
    to_plot=['series','horizontal']
)

# Output
if indep_test_passed:
    print('The KPI data appears iid.')
    print('Variability score: %0.2f %s' % (var_score, score['unit']))
else:
    print('The KPI data does not appear iid.')

The KPI data appears iid.
Variability score: 0.40 s

Network Profiling¶

Back to top

It is common for real-world networks to exhibit periodic patterns. For example, there may be a lot more cross-traffic (i.e., interference) at specific times of the day. Neglecting this may lead to bias in performance evaluation results.

TriScale ’s network_profiling() function computes the autocorrelation coefficients of link quality data such as that collected for the Flocklab testbed. Peaks in the autocorrelation plot suggest seasonal components in the network conditions, which helps detect (sometimes unexpected) time dependencies. To avoid biasing the results, the span of a series of runs should be chosen as a multiple of the seasonal components. Hence, TriScale assists the user in deciding on the time span for a series of runs, i.e., the time interval containing all the runs of one series.

The network_profiling() functions takes two compulsory arguments:

the link quality data
the link quality bounds

The link quality data can be passed as a file name (ie, a string) or as a Pandas dataframe.

If a string is passed, the function tries to read the file name as a csv file (comma separated) with date_time data in the first column and link_quality data in the second column.
If a pandas DataFrame is passed, data must contain a column named date_time (or have a DatetimeIndex), and a link_quality column.

The function essentially performs an empirical independence test on the link quality data. First, it checks whether the data appears weakly stationary using TriScale 's convergence test. Then, it tests for independence based on the computation of the link quality data autocorrelation coefficients.

In [12]:

# Load sample link quality data (packet reception rate, in %)
data = 'ExampleData/link_quality_data.csv'

link_quality_bounds = [0,100]
link_quality_name = 'PRR [%]'

# Produce the plot
fig_data, fig_autocorr = triscale.network_profiling(
    data, 
    link_quality_bounds, 
    name = link_quality_name
)
fig_data.show()
fig_autocorr.show()

As we can see in this example, the link quality data appears weakly stationary (top plot) but has strong correlation patterns, shown by the peaks in the autocorrelation plot.

The dataset contains one test every two hours. The first peak at lag 12 (i.e., 24h) reveals the daily seasonal component. The data also show another at lag 84; which corresponds to one week. Indeed, thereis less interference in the weekends than on weekdays: this creates a weekly seasonal component.

Knowing about and accounting for these patterns is important to ensure fairness when comparing protocols: the span of a series should be long enough such that it does not matter when the series starts; in other words, the series should span an interval as large as the largest seasonal component, which TriScale helps identifying.

Back to top

This concludes this (short) demo of TriScale. For more details, refer to the functions' doc string (e.g. help(triscale.network_profiling)).

Various case studies (casestudy_xxx.ipynb) are also included in the Triscale repository; they illustrate how to use TriScale for concrete performance evaluations.