Notebook

TriScale - Data Analysis¶

This notebook is intended for self-study of TriScale.

This notebook contains tutorial materials for TriScale.

More specifically, this notebook presents TriScale's data analysis functions,
leading to the computation of variability scores, which serve to quantify replicability.

If you don't know about Jupyter Notebooks and how to interact with them,

fear not! We compiled everything that you need to know here: Notebook Basics :-)

For more details about TriScale, you may refer to the paper.

Data analysis
Your turn: time to practice

To get started, we need to import a few Python modules.
All the TriScale-specific functions are part of one module called triscale.

In [ ]:

import os
from pathlib import Path

import pandas as pd
import numpy as np

import triscale

Alright, we are ready to analyse some data!

Data analysis¶

In this notebook, we consider that experiments have been designed, that the
corresponding data has been collected, and we focus on the data analysis.

TriScale's methodology is structured around three time scales:

Runs which lead to the computation of performance metrics;
Series which lead to the computation of key performance indicators (KPIs);
Sequels which lead to the computation of variability scores.

TriScale's API provides one function per time scale, which we will look at in the next sections.

Runs and Metrics¶

In TriScale, metrics measure performance dimension during one run. The computation of metrics
is implmented in the analysis_metric() function, which takes two compulsory arguments:

the raw data;
the metric definition.

The raw data can be passed as a file path (i.e., a string) or as a Pandas DataFrame.

If a string is passed, the function tries to read the file as a csv file (comma separated)

where x data is expected in the first column and y data in the second column.

If a pandas DataFrame is passed, data must contain columns named x and y.

The metric definition is provided as a dictionary, with only the measure key being compulsory.
This key defines "what is the computation to be performed" on the data; in other word, what the "metric" is.
Currently supported measures are:

Any percentile ($0<P<100$);
mean;
minimum;
maximum.

The analysis_metric() function returns 3 outputs:

the result of the convergence test (not discussed in this notebook);
the value of the metric's measure;
a plot of the raw data.

The following cell illustrates the basic usage of analysis_metric() function.

In [ ]:

# Input data file path
data = Path('ExampleData/raw_data.csv') # one-way delay of a full-throttled flow using TCP BBR

# Definition of a TriScale metric
metric = {  
    'measure': 50,   # Integer: interpreted as a percentile
    'unit'   : 'ms', # For display only
         }

has_converged, metric_measure, plot = triscale.analysis_metric( 
    str(data),
    metric)

print('Run metric: %0.2f %s' % (metric_measure, metric['unit']))

Passing the optional argument plot=True automatically displays the plot of the raw data.

In [ ]:

has_converged, metric_measure, plot = triscale.analysis_metric( 
    str(data),
    metric,
    plot=True,
)

Note. As presented here, the analysis_metric() function is not very interesting:

it "only" returns some percentile of an array... The function is more useful when the metric
attempts to estimate the long-term performance of the system; that is, the value one
would obtain shall the run last longer/more data points be collected.
When this is the case, TriScale performs a convergence test on the data, which can be
triggered in analysis_metric() function by passing the optional convergence parameter.
The study of convergence goes beyond the scope of this tutorial; refer to the paper for more details.

Series and KPIs¶

In TriScale, key performance indicators (KPIs) measure performance dimensions across a series of runs.
Performing multiple runs allows to mitigate the inherent variability of the experimental conditions.
KPIs capture this variability by estimating percentiles of the (unknown) metric distributions.

Concretely, a TriScale KPI is a one-sided confidence interval of a percentile
e.g., a lower bound for the 25th percentile of a metric, estimated with a 95% confidence level.
The computation of KPIs is implmented in the analysis_kpi() function, which takes
two compulsory arguments:

the metric data;
the KPI definition.

The metric data can be passed as a list or an NumPy array. The KPI definition is provided
as a dictionary with three compulsory keys:

percentile ($0<P<100$);
confidence ($0<C<100$);
bounds.

The KPI bounds are the expected extremal values for the metric. This information is used
during the independence test (see below). If the metrics bounds are unknown, simply pass
the minimum and maximum metric values as bounds.

The analysis_kpi() function returns the output of two computations:

It performs an empirical independence test (see below) and returns True (test is passed)

or False (it is not); 2. It computes the KPI and returns its value.

Why the independence test? The metric data must be iid for the KPI to be a valid

estimate of the underlying metric distribution. Note however that, in general,
independence is a property of the data collection process, not of the data.
Unfortunately though, in many practical cases in networking, independence cannot be
guaranteed; for example, because there is some correlations in the interference
conditions between sucessive experiments.
In such a case, one can perform an empirical test for independence; essentially,
this test assesses whether the level of correlation in the data appears sufficiently
low such that the data can be reasonably assumed to be iid.

The following cell illustrates the basic usage of analysis_kpi() function.

In [ ]:

# Input data file path
data = Path('ExampleData/metric_data.csv') # Failure recovery time, in seconds

# Read data in a Pandas DataFrame
df = pd.read_csv(data, header=0, names=['metric'])

# KPI definition
KPI = {
    'percentile': 75,
    'confidence': 95,
    'bounds': [0,10],
    'unit': 's'
}

# Computes the KPI
indep_test_passed, KPI_value = triscale.analysis_kpi(
    df.metric.values,
    KPI,
)

# Output
if indep_test_passed:
    print('The metric data appears iid.')
    print('KPI value: %0.2f %s' % (KPI_value, KPI['unit']))
else:
    print('The metric data does not appear iid.')

Since the metric data appears to be iid, we can interpret the KPI value as follows:

With a confidence level of 95%, the 75th percentile on the metric is smaller or equal to 1.92s.

In other words, with a probability of 95%, the performance metric is smaller or equal to 1.92s
in at least three quarters of the runs.

Even when the independence test fails, the KPI value is computed and returned. However,
the user must then be aware that the resulting KPI is not a trustworthy estimate of the
corresponding percentile (i.e., it is not a valid confidence interval).

For more details about the implementation of the empirical independence test, refer to the paper.

Moreover, if there are not enough data points to compute the desired KPI, the function returns np.nan (not-a-number).

Optionally, the analysis_kpi() function produces and displays 3 plots:

the metric data series (series)
the autocorrelation plot (autocorr)
the metric data and the corresponding KPI value (horizontal)

This is illustrated below.

In [ ]:

# Computes the KPI and plot
indep_test_passed, KPI_value = triscale.analysis_kpi(
    df.metric.values,
    KPI,
    to_plot=['series','autocorr','horizontal']
)

Sequels and Variability Scores¶

Sequels are repetitions of series of runs. TriScale's variability scores measure the variations
of KPI values across sequels. Hence, sequels aim to detect long-term variations of KPIs and,
ultimately, to quantify the replicability of an experiment.

Concretely, a variability score is composed of two one-sided CI for a symmetric pair of percentiles;
e.g., a 75% confidence interval for the 25-75th percentiles. The underlying computations are
the same as for the KPIs values. The computation of variability scores is implmented in the
analysis_variability() function, which takes two compulsory arguments:

the KPI data;
the variability score definition.

The KPI data can be passed as a list or an NumPy array. The variability score definition is provided
as a dictionary with three compulsory keys:

percentile ($0<P<100$);
confidence ($0<C<100$);
bounds.

The bounds are the expected extremal values for the KPI. Like for the analysis_kpi() function
the bounds are used during the independence test. If the metrics bounds are unknown, simply pass
the minimum and maximum metric values as bounds.

The analysis_variability() function returns the output of two computations:

It performs an empirical independence test and returns True (test is passed) or False (it is not);
It computes the variability score and returns its value.

Moreover, if there are not enough data points to compute the desired variability score, the function
returns np.nan (not-a-number).

The same plotting options as for analysis_kpi() are available, as illustrated below.

In [ ]:

# Input data file path
data = 'ExampleData/kpi_data.csv' # failure recovery time, in seconds

# Read data in a Pandas DataFrame
df = pd.read_csv(data, header=0, names=['kpi'])

# Score definition
score = {
    'percentile': 25, # the 25th-75th  percentiles range
    'confidence': 95,
    'bounds': [0,10],
    'unit': 's'
}

# Computes the variability score
(indep_test_passed, 
 upper_bound, 
 lower_bound, 
 var_score, 
 rel_score) = triscale.analysis_variability(
    df.kpi.values,
    score,
    to_plot=['series','horizontal']
)

# Output
if indep_test_passed:
    print('The KPI data appears iid.')
    print('Variability score: %0.2f %s' % (var_score, score['unit']))
else:
    print('The KPI data does not appear iid.')

Since the KPI data appears to be iid, we can interpret the variability score as follows:

With a confidence level of 95%, the inter-quartile (25th-75th perc) range on the KPIs is smaller or equal to 0.4s.

In other words, with a probability of 95%, across all series, the middle 50% of KPI values differ by 0.4s or less.

Your turn: time to practice¶

We have collected data for a comparative evaluation of congestion-control schemes using the
Pantheon platform. More details about the experiment setup can be found in the TriScale paper.

We performed five series of ten runs each, for 16 different schemes. For the purpose of this tutorial,
we provide a dataset containing pre-computed metric values for each run. It contains two metrics:

the mean throughput;
the 95th percentile of the one-way delay.

You task consists in analysing this dataset using TriScale. The goal is to compute and compare the
variability scores of different congestion-control schemes. We will focus on the throughput metric only
(an arbitrary choice).

Let us first load and visualise the dataset.

In [ ]:

# Load and display the entire dataset
df = pd.read_csv(Path('ExampleData/metrics_wo_convergence.csv'))
display(df)

We can easily extract the lists of schemes used and dates identifying each series.

In [ ]:

# Extract the list of congestion control schemes
schemes = df.cc.unique()
print(schemes)

The following cell contains a simple function to filter this dataset and extract metric values
per scheme and per series.

In [ ]:

def get_metrics(df, scheme, metric):
    '''Parse the dataset to extract the series of metric values f
    or one scheme and all series of runs.
    '''
    # Initialize output
    metric_data = []
    
    # List of dates (identifies the series)
    dates = df.datetime.unique()
    
    # For each series
    for date in dates:
        
        # Setup the data filter
        filter = (
            (df.cc == scheme) &
            (df.datetime == date) 
        )

        # Filter
        df_filtered = df.where(filter).dropna()
        
        # Store metrics values for that series
        metric_data.append(list(df_filtered[(metric+'_value')].values))        
    
    # Return the desired metric data
    return metric_data

We will use this function to easily extract all metrics values for one scheme and one metric
(e.g., the throughput of bbr).

The definition of KPI and variability score to use for our analysis are provided below.

In [ ]:

# KPI
KPI  = {'percentile': 25,
         'confidence': 75,
         'name': 'KPI Throughput',
         'unit': 'Mbit/s',
         'bounds':[0,120],    # expected value range
        }

# Variability score
score  = {'percentile': 75,
         'confidence': 75,
         'name': 'Throughput',
         'unit': 'Mbit/s',
         'bounds':[0,120],    # expected value range
        }

Note. We aim to estimate the 25th percentile for the throughput, where higher is better.

Thus, the KPI provides the performance expected in at least 75% of the runs.

You are now all set to analyse this dataset!

In the following cell, we provide a skeletton code which you should complete and excute in order to
answer the following questions:

What is the variability score of the throughput metric of the bbr congestion-control scheme?

Modify the definition of the variability score to estimate the median 'percentile': 50 instead of
the 25-75th percentile range.

What is the value of the variability score now? Does this make sense to you?

Optional (and harder) questions:

Compute the scores for all the schemes. Do they vary a lot?
Do the variability scores seem "big" with respect to the range of KPI values?
Would you say that these experiment appear to be replicable?

In [ ]:

# Extract and display the metrics values for the 5 series of 10 runs 
scheme = 'bbr'
metric = 'throughput' # valid options are 'throughput' and 'delay'
metric_data = get_metrics(df, scheme, metric)

# Initialize an empty list to collect the KPI values
KPI_values = [] 

## Step 1. Compute the KPIs
for series_data in metric_data:
    
    ########## YOUR CODE HERE ###########
    # - compute the KPI value 
    # - if the independence test is passed, 
    # store the value in the KPI list
    #####################################

    
# Print the (valid) KPI values
s = '%i valid KPIs obtained\n> ' % len(KPI_values)
for k in KPI_values:
    s += '%0.2f ' % k
s += '\n  in %s\n' % KPI['unit']
print(s)
    
    
## Step 2. Compute the variability score

########## YOUR CODE HERE ###########
# - compute the variability score 
# - print the result depending on the outcome
#   - if there are not enough KPIs, print `nan`
#   - if the independence test fails, print the score value *(-1)
#   - else, print the score value
#####################################

Solutions¶

Click here show the solutions

########## YOUR CODE HERE ###########   
# - compute the KPI value 
# - if the independence test is passed, 
# store the value in the KPI list

indep_test_passed, KPI_value = triscale.analysis_kpi(
    series_data,
    KPI)
if indep_test_passed:
    KPI_values.append(KPI_value)

#
#####################################

########## YOUR CODE HERE ###########
# - compute the variability score 
# - print the result depending on the outcome
#   - if there are not enough KPIs, print `nan`
#   - if the independence test fails, print the score value *(-1)
#   - else, print the score value

(indep_test_passed, 
 upper_bound, 
 lower_bound, 
 var_score, 
 rel_score) = triscale.analysis_variability(
    KPI_values,
    score
)

if not indep_test_passed: 
    var_score *= -1

print('Variability score: %0.2f %s' % (var_score, score['unit']))

#
#####################################

These code blocks lead to the following output:

5 valid KPIs obtained
> 105.07 105.04 104.68 104.92 105.08 
  in Mbit/s

Variability score: 0.41 Mbit/s

Next step: Seasonal Components
Back to repo