This notebook presents a short use case of the TriScale framework, which compares the performance of Glossy for different parameter values. This use case relies the FlockLab testbed as experiment environment. Elements of this use case are described in the TriScale paper (submitted to NSDI'2020).
In particular, this use case illustrates the importance of network profiling: this example shows how one may reach wrong conclusions (even with high confidence!) when the environmental conditions are not properly assessed.
This evaluation aims to compare two parameters settings for Glossy, a low-power wireless protocol based on synchronous transmissions. Glossy includes as parameter the number of retransmissions of a packet, called $N$. We investigate the impact of two values for $N$ on the reliability of Glossy, measured as the packet reception ratio (PRR). We define our KPI as the median with 95% confidence level.
It is expected that the larger the value for $N$, the more reliable the protocol is (and the more energy it consumes). We are intereted in assessing whether
We test two values for the parameter $N$: 1 and 2 retransmissions.
import os
from pathlib import Path
import pandas as pd
import numpy as np
The entire dataset and source code of this case study is available on Zenodo:
The wget commands below download the required files to reproduce this case study.
# Set `download = True` to download (and extract) the data from this case study
# Eventually, adjust the record_id for the file version you are interested in.
# For reproducing the original TriScale paper, set `record_id = 3451418`
download = False
record_id = 3458116 # version 2 (https://doi.org/10.5281/zenodo.3458116)
files= ['triscale.py',
'helpers.py',
'triplots.py',
'UseCase_Glossy.zip']
if download:
for file in files:
print(file)
url = 'https://zenodo.org/record/'+str(record_id)+'/files/'+file
os.system('wget %s' %url)
if file[-4:] == '.zip':
with zipfile.ZipFile(file,"r") as zip_file:
zip_file.extractall()
print('Done.')
We now important the custom modules that we just downloaded.
triscale
is the main module of TriScale. It contains all the functions from TriScale's API; that is, functions meant to be called by the user.flocklab
is a module specific to this case study. It contains helper functions that parse the FlockLab test results files for this use case.import triscale
import UseCase_Glossy.flocklab as flocklab
The test scenario is very simple. During one communication round, each node in the network initiate in turn a Glossy flood (using $N=1$ retransmission). All the other nodes log whether they successfully received the packet. The same round is then repeated with $N=2$ retransmissions.
The collected data is available in the TriScale artifacts repository.
First, the serial log files from FlockLab are parsed to create the csv files compatible with TriScale API.
# Expected list of node ids
node_list = [1, 2, 3, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19,
20, 22, 23, 24, 25, 26, 27, 28, 32, 33]
# Path to results to parse
data_folder = Path('UseCase_Glossy/Data_Glossy')
# Loop through the tests and parse the serial log file
date_list = [x for x in data_folder.iterdir() if x.is_dir()]
for test_date in sorted(date_list)[:-2]: # the last two days are not yet available
test_list = [x for x in test_date.iterdir() if x.is_dir()]
for test in test_list:
test_file_name = str(test / 'serial.csv')
flocklab.parse_serial_log_glossy_use_case(test_file_name, node_list, verbose=False)
print('Done.')
Done.
The test scenario is terminating. Thus, there is no need to check whether the runs have converged: the test runtime must simply be large enough to complete the scenario; that is, the two rounds of Glossy floods.
The analysis starts with the computation of the metric. In this evaluation, we define the metric as the median packet reception ratio (our Y values) between all the nodes (node IDs are the X values). In other words, our metric is the median number of floods what are successfully received by one node in the network.
The metric values are stored in a DataFrame, which also contains the test number and the date and time of the test.
# Create the storing DataFrame
columns = [ 'date_time', 'test_number', 'PRR_N1', 'PRR_N2' ]
df = pd.DataFrame(columns=columns,
dtype=np.int32)
# TriScale inputs
metric = {'name': 'Average Packet Reception Ratio',
'unit': '%',
'measure': 50,
'bounds': [0,1]}
convergence = {'expected': False}
# Loop through the tests files and parse the csv files
date_list = [x for x in data_folder.iterdir() if x.is_dir()]
for test_date in sorted(date_list)[:-2]: # the last two days are not yet available
test_list = [x for x in test_date.iterdir() if x.is_dir()]
for test in test_list:
# Get the test number
test_number = int(str(test)[-5:])
# Get the test date_time
xml_file = str(test / 'testconfiguration.xml')
with open(xml_file, 'r') as xml_file:
for line in xml_file:
if '<start>' in line:
tmp = line[0:-1].split('<start>')
test_datetime = tmp[1][:-8]
break
# Compute PRR metric for N=1
data_file_name = str(test / 'glossy_reliability_N1.csv')
converge1, PRR_N1, figure1 = triscale.analysis_metric(data_file_name,
metric,
plot=False,
convergence=convergence,
verbose=False)
# Compute PRR metric for N=2
data_file_name = str(test / 'glossy_reliability_N2.csv')
converge2, PRR_N2, figure2 = triscale.analysis_metric(data_file_name,
metric,
plot=False,
convergence=convergence,
verbose=False)
df_new = pd.DataFrame([[test_datetime, test_number, PRR_N1, PRR_N2]],
columns=columns,
dtype=np.int32)
df = pd.concat([df, df_new])
# Parse dates
df['date_time'] = pd.to_datetime(df['date_time'], utc=True)
df.set_index('date_time', inplace=True)
df.sort_values("test_number", inplace=True)
df.head()
test_number | PRR_N1 | PRR_N2 | |
---|---|---|---|
date_time | |||
2019-08-21 23:37:10+00:00 | 72625 | 64.0 | 92.0 |
2019-08-22 13:57:10+00:00 | 72626 | 68.0 | 84.0 |
2019-08-22 05:28:20+00:00 | 72627 | 80.0 | 92.0 |
2019-08-22 11:19:30+00:00 | 72628 | 80.0 | 92.0 |
2019-08-22 18:43:50+00:00 | 72629 | 80.0 | 96.0 |
We can already see from these data that the results strongly differ between the different days. For example, if we isolate the results from a weekday and a weekend and compare them:
weekend = df.loc['2019-08-24']
weekend.median()
# weekend
test_number 72761.0 PRR_N1 88.0 PRR_N2 94.0 dtype: float64
weekday = df.loc['2019-08-26']
weekday.median()
test_number 72908.0 PRR_N1 78.0 PRR_N2 88.0 dtype: float64
We can see that the median PRR for one node is much lower on a weekday than on a weekend
Day | Type | $N$=1 | $N$=2 |
---|---|---|---|
2019-08-24 | weekend | 88 | 94 |
2019-08-26 | weekday | 78 | 88 |
Now, if one does not pay attention to the days when the runs are performed, it may lead to wrong conclusions with respect to the reliability of Glossy with $N=1$ or $2$ retransmissions. This is illustrated below.
We desire a high confidence in the comparison between Glossy with $N=1$ and $N=2$ retransmissions; We choose as TriScale KPI to estimate the median PRR with a confidence level of 95%. 24 runs (one day of test) is more than necessary to compute this KPI.
KPI = {'name': 'PRR',
'unit': '\%',
'percentile': 50,
'confidence': 95,
'class': 'one-sided',
'bounds': [0,100],
'bound': 'lower'}
for k in [0,5,6]:
triscale.experiment_sizing(KPI['percentile'],
KPI['confidence'],
CI_class=KPI['class'],
robustness=k,
verbose=True,);
A one-sided bound of the 50-th percentile with a confidence level of 95 % requires a minimum of 5 samples A one-sided bound of the 50-th percentile with a confidence level of 95 % requires a minimum of 18 samples with the worst 5 run(s) excluded A one-sided bound of the 50-th percentile with a confidence level of 95 % requires a minimum of 21 samples with the worst 6 run(s) excluded
Thus, with 24 runs, we can compute our KPI: the 6th smallest PRR value is a lower-bound on the median PRR with a probability larger than 95%.
We want to illustrate the issues that may occur when neglecting a seasonal component in the environmental conditions (in this case, the weekly seasonal component from the FlockLab testbed).
Thus, we first compute the KPIs by (intentionally) selecting two days where conditions are "friendly" (ie weekend) and "harsh" (ie weekday), and we evaluate Glossy with $N=1$ and $N=2$ respectively on the "friendly" and "harsh" day.
# Select the metric series for N=1
data = weekend.PRR_N1.dropna().values
to_plot = ['horizontal']
verbose=False
triscale.analysis_kpi(data, KPI, to_plot, verbose=verbose)