Report interpretation

In this tutorial we cover each section of the default popmon report step-by-step. This tutorial uses access of the datastore discussed in the advanced tutorial, but it's okay to ignore that since our aim is to understand the plots and interpret the results.

In [ ]:
import pandas as pd

import popmon
from popmon import resources
In [ ]:
df = pd.read_csv(resources.data("flight_delays.csv.gz"), index_col=0, parse_dates=["DATE"])
report = df.pm_stability_report(time_axis="DATE", time_width="1w")

The datastore holds all values that are computed for the report. The following keys are available:

In [ ]:
list(report.datastore.keys())

The plots that are generated are stored in "report_sections". Each section has three keys:

In [ ]:
list(report.datastore['report_sections'][0].keys())

For each of the sections we will inspect the diagrams:

In [ ]:
[section['section_title'] for section in report.datastore['report_sections']]

The diagrams are currently stored as Base64 encoded images. Traffic lights and alerts are tables. We will use the following helper functions to display them:

In [ ]:
from IPython.core.display import display, HTML

def show_image(plot):
    display(HTML(f'<img src="data:image/jpeg;base64, {plot["plot"]}" />'))
    text = f'<strong>{plot["name"]}</strong>'
    if plot['description']:
        text += f': {plot["description"]}'
    display(HTML(text))

def show_table(plot):
    style = """table.overview{
        margin: 25px;
    }
    table.overview tbody td.metric{
        white-space: nowrap;
    }
    table.overview tbody td.cell{
       border: 1px solid #333333;
       text-align: center;
    }
    table.overview td.cell-green{
        background: green;
    }
    table.overview td.cell-yellow{
        background: yellow;
    }
    table.overview td.cell-red{
        background: red;
    }
    table.overview tfoot td {
        padding-top: 5px;
        text-align: center;
    }
    table.overview tfoot td span{
        -ms-writing-mode: tb-rl;
        -webkit-writing-mode: vertical-rl;
        writing-mode: vertical-rl;
        transform: rotate(180deg);
        white-space: nowrap;

        font-size: 14px;
        font-weight: 300;
    }
    """
    display(HTML(f'<style>{style}</style>'))
    display(HTML(plot["plot"]))

Histograms

Categorical feature

In the figure below you see three histograms overlayed. The histogram_ref (purple) is the reference histogram. Recall this can be complete dataset, or a reference (training)set if you are monitoring the data coming into a model. The histogram (orange) histogram is then the current batch of data (e.g. a week). The histogram_prev1 is them the previous batch (e.g. last week).

On the x-axis, the value of the feature is displayed, in this case the airline. The y-axis contains the bin probability (the normalized counts).

This example shows that all three histograms lie closely together, with the largest difference in the last bin (WN).

In [ ]:
# First section, First Feature, First plot
show_image(report.datastore['report_sections'][0]['features'][0]['plots'][0])

Continuous feature

The interpretation for continuous features is much different, apart from the x-axis being binned continuous values.

Here we observe relatively larges differences around many values of AIR TIME. At ~130 (the previous histogram is much higher than the current) and at ~50 (the current is much higher than the previous).

In [ ]:
show_image(report.datastore['report_sections'][0]['features'][1]['plots'][0])

Traffic Lights

The traffic light overview shows the highest traffic light per batch. That is: Red > Yellow > Green.

There are two kinds of patterns that are important to observe in the traffic lights overview table in general:

  1. The column is mostly red/yellow. This indicates that most thresholds are crossed. This is probably the batch you would like to have a look at.
  2. The row is mostly red/yellow. This indicates that the comparison is overly sensitive. It can be ignored, removed, or the (dynamic) bounds could be increased.

The traffic light overview is particularly useful as way to prioritize which of the diagrams to look at first.

In [ ]:
show_table(report.datastore['report_sections'][1]['features'][0]['plots'][0])

Alerts

The alerts overview table provides insight in how many traffic light bounds are crossed for each batch for each feature.

In [ ]:
show_table(report.datastore['report_sections'][2]['features'][0]['plots'][0])

Comparisons

The comparisons diagrams compare statistics with a reference. In this case, the reference is the preceding time slot.

In [ ]:
show_image(report.datastore['report_sections'][3]['features'][0]['plots'][0])

Profiles

The diagrams in the profiles section track a certain statistic. As we can see from the name and description of the diagram, we are looking at the number of entries. The traffic light bounds are included in the plot. The last bin contains far less results. This is probably because the data stops at the end of 2015, while the bin spans over two years.

In [ ]:
show_image(report.datastore['report_sections'][4]['features'][0]['plots'][0])