TriScale - Data Analysis

This notebook is intended for live tutorial sessions about TriScale.
Here is the self-study version.

To get started, we need to import a few Python modules. All the TriScale-specific functions are part of one module called triscale.

In [ ]:
import os
from pathlib import Path

import pandas as pd
import numpy as np

import triscale

TriScale analysis API

TriScale's API contains three functions for data analysis

  • analysis_metric()
  • analysis_kpi()
  • analysis_variability()

These functions have a similar structure. They take as input a data sample and the definition of the metric, KPI, or variability score to compute, respectively. Each function performs the corresponding analysis and returns its results, together with some optional data visualizations.

Here are minimal examples:

In [ ]:
# Some random data
Y = np.random.sample(100)
X = np.arange(len(Y))
data = np.array([X,Y])
df = pd.DataFrame(np.transpose(data), columns=['x','y'])

# Minimal definition of a TriScale metric
metric = {  
    'measure': 50,   # Integer: interpreted as a percentile
         }

# Basic call of analysis_metric
triscale.analysis_metric( 
    df,
    metric,
    plot=True,
);
In [ ]:
# Minimal definition of a TriScale KPI
KPI = {
    'percentile': 75,
    'confidence': 95,
    'bounds': [0,1]
}

# Basic call of analysis_metric
triscale.analysis_kpi( 
    df.y.values,
    KPI,
    to_plot=['series','autocorr','horizontal']
);
In [ ]:
# Minimal definition of a TriScale KPI
score = {
    'percentile': 75, # the 25th-75th  percentiles range
    'confidence': 95,
    'bounds': [0,1]
}

# Basic call of analysis_metric
triscale.analysis_variability( 
    df.y.values,
    score,
    to_plot=['horizontal']
);

For more details about any function, refer to the docstring, as shown below.

In [ ]:
help(triscale.analysis_kpi)

Note. The functions return the KPI/score values as well as the result of the corresponding independence test. One must not forget to get it and take extra care in the rest of the analysis shall the test returned False.

Analysis of Pantheon data (congestion control)

We have collected data for a comparative evaluation of congestion-control schemes using the Pantheon platform. We use some of these data to illutrate the main analysis functions of TriScale.

For a more extensive description of the data collection and analysis, you can check the complete case study notebook or the TriScale paper itself.

In a nutshell, the dataset contains

  • two metrics: the mean throughput and 95th percentile of the one-way delay
  • measured across five series of ten runs each
  • collected for 17 congestion-control schemes
  • using an emulated environment.

Let us first load and visualise the dataset.

In [ ]:
# Load and display the entire dataset
df = pd.read_csv(Path('ExampleData/metrics_wo_convergence.csv'))
display(df)

We can easily extract the lists of schemes used and dates identifying each series.

In [ ]:
# Extract the list of congestion control schemes
schemes = df.cc.unique()
print(schemes)

Let's create a short get_metrics function to easily extract all metrics values for one scheme and one metric
(e.g., the throughput of bbr).

In [ ]:
def get_metrics(df, scheme, metric):
    '''Parse the dataset to extract the series of metric values f
    or one scheme and all series of runs.
    '''
    # Initialize output
    metric_data = []
    
    # List of dates (identifies the series)
    dates = df.datetime.unique()
    
    # For each series
    for date in dates:
        
        # Setup the data filter
        filter = (
            (df.cc == scheme) &
            (df.datetime == date) 
        )

        # Filter
        df_filtered = df.where(filter).dropna()
        
        # Store metrics values for that series
        metric_data.append(list(df_filtered[(metric+'_value')].values))        
    
    # Return the desired metric data
    return metric_data

The definition of KPI and variability score we use are provided below.

In [ ]:
# KPIs
KPI_tput  = {'percentile': 25,
             'confidence': 75,
             'name':       'KPI Throughput',
             'unit':       'Mbit/s',
             'bounds':     [0,120],          # expected value range
             'tag':        'throughput'      # do not change the tag
            }
KPI_delay = {'percentile': 75,
             'confidence': 75,
             'name':       'KPI One-way delay',
             'unit':       'ms',
             'bounds':     [0,100],          # expected value range
             'tag':        'delay'           # do not change the tag
            }

Note. We aim to estimate the 25th percentile for the throughput, where higher is better. It is the opposite for the delay. Thus, both KPIs aim to estimate the performance expected in at least 75% of the runs.

In [ ]:
# Variability scores
score_tput  = {'percentile': 50,
               'confidence': 75,
               'name':       'Throughput',
               'unit':       'Mbit/s',
               'bounds':     [0,120],        # expected value range
               'tag':        'throughput'    # do not change the tag
              }
score_delay = {'percentile': 50,
               'confidence': 75,
               'name':       'One-way delay',
               'unit':       'ms',
               'bounds':     [0,100],        # expected value range
               'tag':        'delay'         # do not change the tag
              }

As an example, let us analyze the throughput of the TCP BBR scheme.

In [ ]:
#####################################
# Extract the metrics values for the 5 series of 10 runs 
#####################################

scheme = 'bbr'         # valid options: print(df.cc.unique())
metric = 'throughput'  # valid options are 'throughput' and 'delay'
metric_data = get_metrics(df, scheme, metric)

# Initialize an empty list to collect the KPI values
KPI_values = [] 

if metric == 'throughput':
    KPI = KPI_tput
    score = score_tput
if metric == 'delay':
    KPI = KPI_delay
    score = score_delay

#####################################
## Step 1. Compute the KPIs
#####################################

for series_data in metric_data:
    
    indep_test_passed, KPI_value = triscale.analysis_kpi(
        series_data,
        KPI)
    if indep_test_passed:
        KPI_values.append(KPI_value)
    
# Print the (valid) KPI values
s = '%i valid KPIs obtained\n> ' % len(KPI_values)
for k in KPI_values:
    s += '%0.2f ' % k
s += '\n  in %s\n' % KPI['unit']
print(s)
    
#####################################    
## Step 2. Compute the variability score
#####################################

(indep_test_passed, 
 upper_bound, 
 lower_bound, 
 var_score, 
 rel_score) = triscale.analysis_variability(
    KPI_values,
    score
)

if not indep_test_passed: 
    var_score *= -1

print('Variability score: %0.2f %s' % (var_score, score['unit']))

Your turn: time to practice

First of all, can we change the KPI definition? Recall that we have 5 series of 10 runs in our dataset.

  • How many runs per series do we need to estimate the 90th percentile at 75% confidence?
  • Or the 75th percentile at 95% confidence?
  • With 10 runs per series, can we do much better than 75th percentile with 75% confidence level?
  • What is the trade-off when using "better" KPIs?

Hint. Remember the triscale.experiment_sizing function? :-)

In [ ]:
########## YOUR CODE HERE ###########
# ...
#####################################

Let's now explore the dataset a little further!

  • What is the variability score of the delay metric of the bbr congestion-control scheme?

Modify the definition of the variability score to estimate the median 'percentile': 50 instead of
the 25-75th percentile range.

  • What are the values of the variability scores now? Does this make sense to you?

Optional (and harder) questions:

  • Compute the scores for all the schemes. Do they vary a lot?
  • Do the variability scores seem "big" with respect to the range of KPI values?
  • Would you say that these experiments appear to be replicable?

Solutions


Click here show the solution: Changing the KPI definition
```python triscale.experiment_sizing(90,75,verbose=True); triscale.experiment_sizing(75,95,verbose=True); # "Best" options triscale.experiment_sizing(87,75,verbose=True); triscale.experiment_sizing(75,94,verbose=True); ``` With 10 runs, - if we set the confidence level to 75%, the largest (resp. smallest) percentile one can estimate is the 87th (resp. 13th) percentile; - if we set the percentile to 75th, the best confidence level we can get is 94%. The trade-off is using these "best" KPIs is that there is no margin for poor runs: the KPI estimate will always be the largest (resp. smallest) collected value. Moreover, if one run should fail, or not converge, then one would not have enough runs left to compute the desired KPI!

Click here show the solution: BBR delay
Simply change the metric definition in the code bloc above from `throughput` to `delay`. It leads to the following output: ``` 5 valid KPIs obtained > 87.08 86.51 86.31 87.17 85.74 in ms Variability score: 1.43 ms ```

Click here show the solution: BBR with median score
Change the percentile in score definitions from 75 to 50, and re-run the analysis. The output is the same: the scores are not affected by the change in score definitions. One would expect that a two-sided estimate of the median would be narrower than the estimate of the 25-75th percentile internal. While this is generally true, having only 5 series of runs is not enough to make a difference. This can be seen with the `robustness` parameter from `triscale.experiment_sizing`: ```python >>> triscale.experiment_sizing(50,75,robustness=1) (5,6) ``` Hence, for a two-sided confidence interval for the median, one needs at least 6 samples in order to "exclude" one.