This mini-lab is a follow-up to some in-class activities we've done. It intentionally doesn't remind you explicitly how to do stuff, because part of the activity is to learn how to do these things on your own!
These google searches may help you:
wget download as filename
python check if file exists
jupyter run shell command
Complete the load_ctu_13_scenario_13
function, fulfilling the requirements listed in its docstring.
Submit a completed version of this notebook, in .ipynb
format.
# Do python library imports...
def load_ctu_13_scenario_13():
'''
Get the ctu-13 scenario 13 dataset, located at https://storage.googleapis.com/security-analytics-datasets-public/ctu-13-scenario-13.zip
Requirements:
* Should download the dataset if it is not found locally
* If it is not found locally, it should download the zip and extract the data
* Should return the data loaded into a pandas dataframe
'''
# Pseudocode hint:
# ```
# if file doesn't exist locally...
# download it
# unzip it
#
# load data into a pandas dataframe
# return the dataframe
# ```
return df
df = load_ctu_13_scenario_13()
This is provided to show you what the data should look like once you have loaded it.
# https://stackoverflow.com/questions/48997644/how-to-describe-columns-as-categorical-values
df.astype('object').describe().transpose()
count | unique | top | freq | |
---|---|---|---|---|
StartTime | 1925149 | 1925147 | 2011/08/16 03:02:32.189675 | 2 |
Dur | 1925149.0 | 658406.0 | 0.0 | 40199.0 |
Proto | 1925149 | 15 | udp | 1512108 |
SrcAddr | 1925149 | 277486 | 147.32.84.138 | 365407 |
Sport | 1898868 | 64571 | 13363 | 144971 |
Dir | 1925149 | 7 | <-> | 1473135 |
DstAddr | 1925149 | 77330 | 147.32.80.9 | 783804 |
Dport | 1915024 | 51986 | 53 | 786775 |
State | 1925148 | 235 | CON | 1471320 |
sTos | 1898070.0 | 5.0 | 0.0 | 1895257.0 |
dTos | 1803339.0 | 4.0 | 0.0 | 1803010.0 |
TotPkts | 1925149 | 2954 | 2 | 1204914 |
TotBytes | 1925149 | 51903 | 214 | 319095 |
SrcBytes | 1925149 | 21758 | 81 | 330493 |
Label | 1925149 | 115 | flow=To-Background-UDP-CVUT-DNS-Server | 756280 |