Banner


Workshop 2.2: Visualization in Jupyter Notebooks

Disclaimer:

This is not intended to be a comprehensive overview of Visualization in Python/Jupyter. There are many libraries and techniques not covered here. These are just a few options that we've used and liked and give you a lot of scope.


Basic plotting with pandas using Matplotlib

Resources:

Cheatsheets :

Matplotlib Cheatsheets

Bar charts

Refer Bar Plots section for more examples and options to customize

In [14]:
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
import pandas as pd
logons_full_df = pd.read_pickle("../data/host_logons.pkl")
net_full_df = pd.read_pickle("../data/az_net_comms_df.pkl")
In [3]:
logons_full_df.head()
Out[3]:
Account EventID TimeGenerated Computer SubjectUserName SubjectDomainName SubjectUserSid TargetUserName TargetDomainName TargetUserSid TargetLogonId LogonType IpAddress WorkstationName TimeCreatedUtc
0 NT AUTHORITY\SYSTEM 4624 2019-02-12 04:56:34.307 MSTICAlertsWin1 MSTICAlertsWin1$ WORKGROUP S-1-5-18 SYSTEM NT AUTHORITY S-1-5-18 0x3e7 5 - - 2019-02-12 04:56:34.307
1 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:37:25.340 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc90e957 3 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:37:25.340
2 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:37:27.997 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc90ea44 3 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:37:27.997
3 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:38:16.550 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc912d62 3 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:38:16.550
4 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:38:21.370 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc913737 3 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:38:21.370
In [15]:
# Preprocess the data- Group by LogonType and count the no of accounts
logontypebyacc = logons_full_df.groupby(['LogonType'])['Account'].count()
logontypebyacc.head()
Out[15]:
LogonType
0      2
2     12
3     13
4      9
5    126
Name: Account, dtype: int64
In [5]:
logontypebyacc.plot(kind='bar')
Out[5]:
<AxesSubplot:xlabel='LogonType'>

Line charts

In [6]:
#Preprocess dataframe by 
logonaccountbyday = logons_full_df.set_index('TimeGenerated').resample('D')['Account'].count()
logonaccountbyday.head()
Out[6]:
TimeGenerated
2019-02-09     3
2019-02-10    11
2019-02-11     6
2019-02-12    72
2019-02-13    15
Freq: D, Name: Account, dtype: int64
In [7]:
logonaccountbyday.plot(figsize = (20,8))
Out[7]:
<AxesSubplot:xlabel='TimeGenerated'>

Customizations

Annotate your charts by adding texts, labels and other customizations.

Docs:

In [8]:
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")

plt.figure(figsize = (20,8))
plt.plot(logonaccountbyday, marker='o')
plt.title("Daily trend of account logons")
plt.xlabel("Date")
plt.ylabel("Logon Count")

# another example of customization with plot
# plt.plot(logonaccountbyday, color='green', marker='o', linestyle='dashed',linewidth=2)

plt.show()

Hvplot, Bokeh made easy(ier)

Holoviews

Bokeh

is a very flexible JS visualization framework. Beautiful interactive charts but somewhat complex.

Example Bokeh Ridge plot

HoloViews

is a higherlevel, declarative layer built on top of Bokeh (or MatplotLib)

Example Holoviews Violin plot

HVplot (HV == Holoviews)

is some of Holoviews functionality implemented as a pandas extension.

Installing and loading

conda install -c pyviz hvplot
pip install hvplot

Examples

In [9]:
import hvplot.pandas

count_of_logons = logons_full_df[["TimeGenerated", "Account"]].groupby("Account").count()
count_of_logons.hvplot.barh(height=300)