# Modify the report's structure¶

In this notebook we have a look at two use cases in which we modify the existing report structure: splitting up large reports and reordering the report sections. Both use cases are based on actual user inquiries. The datasets used in this notebook are obtained using the kaggle api. If you haven't done so already, you should set up the api credentials.

In [ ]:
%load_ext autoreload


In [ ]:
import sys

!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension


You might want to restart the kernel now.

In [ ]:
from pathlib import Path

import kaggle

from pandas_profiling.utils.common import extract_zip

kaggle.api.authenticate()


## Reorder Sections¶

We can leverage the same approach to reorder sections. First, we need to generate a profile report.

In [ ]:
# We are using the Craigslist Carstrucks Data
vehicles_dataset = Path("craigslist-carstrucks-data/vehicles.csv")

if not vehicles_dataset.exists():
"austinreese/craigslist-carstrucks-data",
path="craigslist-carstrucks-data",
quiet=False,
)

extract_zip(
"craigslist-carstrucks-data/craigslist-carstrucks-data.zip",
"craigslist-carstrucks-data/",
)

In [ ]:
import pandas as pd

from pandas_profiling import ProfileReport

# For our demonstration, we only take a fraction of the dataset

# Generate the profile report
vehicles_report = ProfileReport(df)


The structure of the report is stored in the report attribute. The report is essentially a tree object. We inspect the root of the report.

In [ ]:
print(repr(vehicles_report.report))


We can see that the root object is of the type "Sequence". Sequence types have at least the attributes name and items.

In [ ]:
print(vehicles_report.report.content)


In this example, we would like to pull up the samples section, so that the reordered sequence items are:

• Overview
• Samples
• Missing
• Variables
• Interactions
• Correlations
In [ ]:
# Reorder the sections
vehicles_report.report.content["items"] = [
vehicles_report.report.content["items"][i] for i in [0, 5, 1, 2, 3, 4]
]


Finally, we can render the report and see that the changes have taken place:

In [ ]:
vehicles_report.to_notebook_iframe()


## Split Profile Reports¶

When profiling large datasets, a monolithic HTML file can become enormous. Using the report structure generated by pandas-profiling, we create a modular report. In this notebook we demonstrate how to split up a profile report in multiple different titles. We start with generating the report's structure in the usual way. The minimal mode is set to True. This step may take a few minutes.

In [ ]:
# We are using the IEEE Fraud Detection transaction training data
ieee_dataset = Path("ieee-fraud-detection/train_transaction.csv")

if not ieee_dataset.exists():
"ieee-fraud-detection", path="ieee-fraud-detection", quiet=False
)

extract_zip(
"ieee-fraud-detection/ieee-fraud-detection.zip",
"ieee-fraud-detection/",
)

In [ ]:
import pandas as pd

from pandas_profiling import ProfileReport

# Generate the profile report
ieee_fraud_report = ProfileReport(df, minimal=True)

In [ ]:
print(repr(ieee_fraud_report.report))

In [ ]:
print(repr(ieee_fraud_report.report.content))

In [ ]:
from copy import deepcopy

# Make a copy for the original report structure
original_report_structure = deepcopy(ieee_fraud_report.report)

In [ ]:
# Loop over each section
for section in original_report_structure.content["items"]:
# Only consider sections that contain items
#     if len(section.content['items']) > 0:
# Set the report structure
ieee_fraud_report.report = deepcopy(original_report_structure)
# Overwrite the section lists with the section we would like to print
ieee_fraud_report.report.content["items"] = [section]
# Output the report to HTML
ieee_fraud_report.to_file(f"ieee_fraud_report_section_{section.name.lower()}.html")


## Paginate variables¶

We can use the same approach to paginate variables:

In [ ]:
ieee_fraud_report.report = original_report_structure

# Number of variables per page
page_size = 25

# The Root node, which is a sequence of sections
print(repr(ieee_fraud_report.report.content["items"]))

# The variables
variable_section = ieee_fraud_report.report.content["items"][1]
variables = variable_section.content["items"]
variable_count = len(variables)
print(f"Number of variables: {variable_count}")

# Reset the report structure
ieee_fraud_report.report = deepcopy(original_report_structure)

# Only keep the variables section
ieee_fraud_report.report.content["items"] = [
ieee_fraud_report.report.content["items"][1]
]

for page_num, variable_page in enumerate(
[variables[i : i + page_size] for i in range(0, variable_count, page_size)]
):
print(f"Write page {page_num}")
# Set the report title
ieee_fraud_report.title = (
f"IEEE Fraud Detection Dataset, Variables, Page {page_num}"
)

# Overwrite the variables lists with the variables we would like to print
ieee_fraud_report.report.content["items"][0].content["items"] = variable_page

# Output the report to HTML
ieee_fraud_report.to_file(f"ieee_fraud_report_variables_page_{page_num}.html")


In this notebook we have seen two ways of manipulating the report structure. Advanced users may alter the structure in other ways we have not touched, such as exploring deeper parts of the tree structure or inserting and deleting objects.