Python Data Viz Libraries Compared: 8 Popular Graphs Made with pandas, matplotlib, seaborn, and plotly.express¶

Author: Dylan Castillo

I'm teaching a course about the essential tools of Data Science. Among those, I'm going to cover how to use some of the most popular data visualization libraries in Python: pandas (yes, that's not a typo!), matplotlib, seaborn, and plotly.express.

I thought it be useful for my students to have cheat sheet with some popular graphs made with each of these tools. So I wrote this cheat sheet.

In the next sections, you'll learn how to set up your local environment, read the data, and get the code to make the following types of graphs:

Line plot
Grouped bars plot
Stacked bars plot
Area chart
Pie/Donut chart
Histogram
Scatter plot
Boxplot

Let me know what you think!

Set Up a Virtual Environment¶

Working with virtual environments will save you lots of headhaches when working in Python project. So, you'll start by creating one, and installing the required libraries.

If you're using venv, then here's how you set up your local enviroment:

$ python3 -m venv .dataviz
$ source .dataviz/bin/activate
(.dataviz) $ python3 -m pip install pandas==1.2.4 numpy==1.2.0 matplotlib==3.4.2 plotly==4.14.3 seaborn==0.11.1 notebook==6.4.0
(.dataviz) $ jupyter notebook

If you're using conda, then you need to run these commands:

$ conda create --name .dataviz
$ conda activate .dataviz
(.dataviz) $ conda install pandas==1.2.4 numpy==1.19.2 matplotlib==3.4.2 plotly==4.14.3 seaborn==0.11.1 notebook==6.4.0 -y
$ jupyter notebook

That's it! These commands will:

Create a virtual environment called .dataviz
Active the virtual environment
Install the required packages (pandas, numpy, matplotlib, plotly, seaborn, and notebook)
Start a Jupyter Notebook

Note that if you're only planning on using just one of the data visualization libraries, then feel free not to install all of them. For example, if you want to use plotly.express, you don't need to install matplotlib and seaborn.

Start Jupyter Notebook and Import Libraries¶

Open Jupyter Notebook. Then, create a new notebook by clicking on New > Python3 notebook in the menu. By now, you should have an empty Jupyter notebook in front of you. Now, let's get to the fun part!

First, you'll need to import the required libraries. Create a new cell in your notebook and paste the following code to import the required libraries:

In [1]:

# All
import pandas as pd
import numpy as np

# matplotlib
import matplotlib.ticker as mtick
import matplotlib.pyplot as plt

# plotly
import plotly.io as pio
import plotly.express as px

# seaborn
import seaborn as sns

# Set templates
pio.templates.default = "seaborn"
plt.style.use("seaborn")

This code will import the required libraries and set up the themes for matplotlib and plotly. Each library provides you with a specific set of functionalities:

pandas helps you read the data
matplotlib.pyplot, plotly.express and seaborn will help you make the graphs
matplotlib.ticker provides with a way to set specific settings of the tickers on your axes in your matplotlib graphs
plotly.io makes it easy to define a specific theme for your plotly graphs

In lines 17 and 18, you define the themes for plotly and matplotlib. In this case, you set them to use the seaborn theme. This will make the graphs from all the libraries look similar.

Understand the Data¶

Throughout this tutorial you'll use a dataset with stock market data for 29 companies compiled by ichardddddd. It has the following columns:

Date: Date corresponding to observed value
Open: Price (in USD) at market open at the specified date
High: Highest price (in USD) reached during the corresponding date
Low: Lowest price (in USD) reached during the corresponding date
Close: Price (in USD) at market close at the specified date
Volume: Number of shares traded
Name: Stock symbol of the company

You can take a look ad the data by taking a sample of a few rows:

In [2]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)
df.sample(5)

Out[2]:

	Date	Open	High	Low	Close	Volume	Name
25931	2013-01-18	52.24	52.34	51.81	52.34	8492176	DIS
53204	2013-06-05	98.13	98.16	96.12	96.42	5394802	MCD
39946	2008-09-26	117.21	121.01	117.01	119.42	4760683	IBM
37191	2009-10-15	27.28	27.37	27.05	27.30	13350145	HD
2877	2017-06-08	204.84	206.03	204.09	205.94	2451348	MMM

This is a long dataset (in regards to the stock names). In the next sections, you'll notice that some libraries make it easy to work with data in this form, and others will require you to transform it into a wide dataset.

That's it! Now you can find whatever graph you'd like to make and copy-paste its code.

Line Plot¶

Read the data as follows:

In [3]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

df = df.loc[df.Name.isin(["AAPL", "JPM", "GOOGL", "AMZN"]), ["Date", "Name", "Close"]]
df["Date"] = pd.to_datetime(df.Date)
df.rename(columns={"Close": "Closing Price"}, inplace=True)

Line Plot Using `pandas`¶

In [4]:

df_wide = df.pivot(index="Date", columns="Name", values="Closing Price")
df_wide.plot(
    title="Stock prices (2006 - 2017)", ylabel="Closing Price", figsize=(12, 6), rot=0
)

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x16d5f6ac0>

Line Plot Using `matplotlib`¶

In [5]:

fig, ax = plt.subplots(figsize=(12, 6))

for i, g in df.groupby("Name"):
    ax.plot(g["Date"], g["Closing Price"], label=i)

ax.set_title("Stock prices (2006 - 2017)")
ax.set_ylabel("Closing Price")
ax.set_xlabel("Date")
ax.legend(title="Name")

Out[5]:

<matplotlib.legend.Legend at 0x16d714400>

Line Plot Using `seaborn`¶

In [6]:

fig, ax = plt.subplots(figsize=(12, 6))
sns.lineplot(data=df, x="Date", y="Closing Price", hue="Name", ax=ax)
ax.set_title("Stock Prices (2006 - 2017)")

Out[6]:

Text(0.5, 1.0, 'Stock Prices (2006 - 2017)')

Line Plot Using `plotly.express`¶

In [7]:

fig = px.line(
    df,
    x="Date",
    y="Closing Price",
    color="Name",
    title="Stock Prices (2006 - 2017)",
    width=900,
    height=500,
)
fig.show()

Grouped Bars Plot¶

Read the data as follows:

In [8]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

df = df[df.Name == "AAPL"]
df["Year"] = pd.to_datetime(df.Date).dt.year
df = df.query("Year >= 2014").groupby("Year").max().reset_index(drop=False)

Grouped Bars Using `pandas`¶

In [9]:

df.plot.bar(
    x="Year",
    y=["Open", "Close"],
    rot=0,
    figsize=(12, 6),
    ylabel="Price in USD",
    title="Max Opening and Closing Prices per Year for AAPL",
)

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x16ddb0a60>

Grouped Bars Using `matplotlib`¶

In [10]:

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(df.Year))
width = 0.25

ax.bar(x - width / 2, df.Close, width, label="Open")
ax.bar(x + width / 2, df.Open, width, label="Close")

ax.set_xlabel("Year")
ax.set_ylabel("Price in USD")
ax.set_title("Max Opening and Closing Prices per Year for AAPL")

ax.set_xticks(x)
ax.set_xticklabels(df.Year)

ax.legend()

Out[10]:

<matplotlib.legend.Legend at 0x16ddc5550>

Grouped Bars Using `seaborn`¶

In [11]:

df_long = df.melt(
    id_vars="Year",
    value_vars=["Open", "Close"],
    var_name="Category",
    value_name="Price",
)

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(data=df_long, x="Year", y="Price", hue="Category", ax=ax)

ax.set_title("Max Opening and Closing Prices per Year for AAPL")
ax.legend(title=None)

Out[11]:

<matplotlib.legend.Legend at 0x16d7e7160>

Grouped Bars Using `plotly.express`¶

In [12]:

fig = px.bar(
    df,
    x="Year",
    y=["Open", "Close"],
    title="Max Opening and Closing Prices per Year for AAPL",
    barmode="group",
    labels={"value": "Price in USD"},
    width=900,
    height=500,
)
fig.show()

Stacked Bars Plot¶

Read the data as follows:

In [13]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks_filter = ["AAPL", "JPM", "GOOGL", "AMZN", "IBM"]
df = df[df.Name.isin(stocks_filter)]
df["Date"] = pd.to_datetime(df.Date)
df["Year"] = pd.to_datetime(df.Date).dt.year
df["Volume"] = df["Volume"] / 1e9

df = (
    df[["Year", "Volume", "Name"]]
    .query("Year >= 2012")
    .groupby(["Year", "Name"])
    .sum()
    .reset_index(drop=False)
)

Stacked Bars Using `pandas`¶

In [14]:

df_wide = df.pivot(index="Year", columns="Name", values="Volume")
df_wide.plot.bar(
    rot=0,
    figsize=(12, 6),
    ylabel="Volume (billions of shares)",
    title="Trading volume per year for selected shares",
    stacked=True,
)

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x16de1f3a0>

Stacked Bars Using `matplotlib`¶

In [15]:

fig, ax = plt.subplots(figsize=(12, 6))

bottom = np.zeros(df.Year.nunique())
for i, g in df.groupby("Name"):
    ax.bar(g["Year"], g["Volume"], bottom=bottom, label=i, width=0.5)
    bottom += g["Volume"].values

ax.set_title("Trading volume per year for selected shares")
ax.set_ylabel("Volume (billions of shares)")
ax.set_xlabel("Year")

ax.legend()

Out[15]:

<matplotlib.legend.Legend at 0x16de80220>

Stacked Bars Using `seaborn`¶

In [16]:

fig, ax = plt.subplots(figsize=(12, 6))

ax = sns.histplot(
    x=df.Year,
    hue=df.Name,
    weights=df.Volume,
    multiple="stack",
    shrink=0.5,
    discrete=True,
    hue_order=df.groupby("Name").Volume.sum().sort_values().index,
)

ax.set_title("Trading volume per year for selected shares")
ax.set_ylabel("Volume (billions of shares)")

legend = ax.get_legend()
legend.set_bbox_to_anchor((1, 1))

Stacked Bars Using `plotly.express`¶

In [17]:

fig = px.bar(
    df,
    x="Year",
    y="Volume",
    color="Name",
    title="Trading volume per year for selected shares",
    barmode="stack",
    labels={"Volume": "Volume (billions of shares)"},
    width=900,
    height=500,
)
fig.show()

Area Chart¶

Read the data as follows:

In [18]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks = ["AAPL", "AMZN", "GOOGL", "IBM", "JPM"]
df = df.loc[df.Name.isin(stocks), ["Date", "Name", "Volume"]]
df["Date"] = pd.to_datetime(df.Date)
df = df[df.Date.dt.year >= 2017]
df["Volume Perc"] = df["Volume"] / df.groupby("Date")["Volume"].transform("sum")

Area Chart Using `pandas`¶

In [19]:

df_wide = df.pivot(index="Date", columns="Name", values="Volume Perc")

ax = df_wide.plot.area(
    rot=0,
    figsize=(12, 6),
    title="Distribution of daily trading volume - 2017",
    stacked=True,
)
ax.legend(bbox_to_anchor=(1, 1), loc="upper left")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))

Area Chart Using `matplotlib`¶

In [20]:

df_wide = df.pivot(index="Date", columns="Name", values="Volume Perc")

fig, ax = plt.subplots(figsize=(12, 6))

ax.stackplot(df_wide.index, [df_wide[col].values for col in stocks], labels=stocks)
ax.legend(bbox_to_anchor=(1, 1), loc="upper left")

ax.set_title("Distribution of daily trading volume - 2017")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))

Area Chart Using `plotly.express`¶

In [21]:

fig = px.area(
    df,
    x="Date",
    y="Volume Perc",
    color="Name",
    title="Distribution of daily trading volume - 2017",
    width=900,
    height=500,
)
fig.update_layout(yaxis_tickformat="%")
fig.show()

Pie or Donut Chart¶

Read the data as follows:

In [22]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks_filter = ["AAPL", "JPM", "GOOGL", "AMZN", "IBM"]
df = df.loc[df.Name.isin(stocks_filter), ["Name", "Volume"]]
df = df.groupby("Name").sum().reset_index()

Pie/Donut Chart Using `pandas`¶

In [23]:

df.set_index("Name").plot.pie(
    y="Volume",
    wedgeprops=dict(width=0.5),
    figsize=(8, 8),
    autopct="%1.0f%%",
    pctdistance=0.75,
    title="Distribution of trading volume for selected stocks (2006 - 2017)",
)

Out[23]:

<matplotlib.axes._subplots.AxesSubplot at 0x16d6ed400>

Pie/Donut Chart Using `matplotlib`¶

In [24]:

fig, ax = plt.subplots(figsize=(8, 8))

ax.pie(
    df.Volume,
    labels=df.Name,
    autopct="%1.0f%%",
    wedgeprops=dict(width=0.5),
    pctdistance=0.75,
)
ax.set_title("Distribution of trading volume for selected stocks (2006 - 2017)")
ax.legend()

Out[24]:

<matplotlib.legend.Legend at 0x16dad0940>

Pie/Donut Chart Using `plotly.express`¶

In [25]:

fig = px.pie(
    data_frame=df,
    values="Volume",
    names="Name",
    hole=0.5,
    color="Name",
    title="Distribution of trading volume for selected stocks (2006 - 2017)",
    width=900,
    height=500,
)
fig.show()

Histogram¶

Read the data as follows:

In [26]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks_filter = ["GOOGL", "AMZN"]
df = df.loc[df.Name.isin(stocks_filter), ["Name", "Close"]]

Histogram Using `pandas`¶

In [27]:

fig, ax = plt.subplots(figsize=(12, 6))

for idx, (i, g) in enumerate(df.groupby("Name")):
    if idx == 0:
        _, bins, _ = ax.hist(g.Close, alpha=0.75, label=i, bins=30)
    else:
        ax.hist(g.Close, alpha=0.75, label=i, bins=bins)

ax.legend()
ax.set_title("Distribution of Closing Prices - GOOGL vs. AMZN")
ax.set_ylabel("Frequency")
ax.set_xlabel("Closing Price")

Out[27]:

Text(0.5, 0, 'Closing Price')

Histogram Using `matplotlib`¶

In [28]:

fig, ax = plt.subplots(figsize=(12, 6))

for idx, (i, g) in enumerate(df.groupby("Name")):
    if idx == 0:
        _, bins, _ = ax.hist(g.Close, alpha=0.75, label=i, bins=30)
    else:
        ax.hist(g.Close, alpha=0.75, label=i, bins=bins)

ax.legend()
ax.set_title("Distribution of Closing Prices - GOOGL vs. AMZN")
ax.set_ylabel("Frequency")
ax.set_xlabel("Closing Price")

Out[28]:

Text(0.5, 0, 'Closing Price')

Histogram Using `seaborn`¶

In [29]:

fig, ax = plt.subplots(figsize=(12, 6))
sns.histplot(data=df, x="Close", hue="Name", ax=ax)
ax.set_title("Distribution of Closing Prices - GOOGL vs. AMZN")
ax.set_ylabel("Frequency")
ax.set_xlabel("Closing Price")

Out[29]:

Text(0.5, 0, 'Closing Price')

Histogram Using `plotly.express`¶

In [30]:

fig = px.histogram(
    df,
    x="Close",
    color="Name",
    labels={"Close": "Closing Price", "count": "Frequency"},
    title="Distribution of Closing Prices - GOOGL vs. AMZN",
    barmode="overlay",
    width=900,
    height=500,
)
fig.show()

Scatter Plot¶

Read the data as follows:

In [31]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks_filter = ["GOOGL", "AMZN"]
df = df.loc[
    (df.Name.isin(stocks_filter)) & (pd.to_datetime(df.Date).dt.year >= 2017),
    ["Date", "Name", "Open", "Close"],
]
df["Return"] = (df["Close"] - df["Open"]) / df["Open"]
df_wide = df.pivot(index="Date", columns="Name", values="Return")

Scatter Plot Using `pandas`¶

In [32]:

ax = df_wide.plot.scatter(
    x="GOOGL", y="AMZN", title="Daily returns - GOOGL vs. AMZN", figsize=(8, 8)
)

ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
ax.xaxis.set_major_formatter(mtick.PercentFormatter(1))

Scatter Plot Using `matplotlib`¶

In [33]:

import matplotlib.ticker as mtick

fig, ax = plt.subplots(figsize=(8, 8))

ax.scatter(x=df_wide["GOOGL"], y=df_wide["AMZN"])

ax.set_xlabel("GOOGL")
ax.set_ylabel("AMZN")
ax.set_title("Daily returns - GOOGL vs. AMZN")

ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
ax.xaxis.set_major_formatter(mtick.PercentFormatter(1))

Scatter Plot Using `seaborn`¶

In [34]:

fig, ax = plt.subplots(figsize=(8, 8))

sns.scatterplot(data=df_wide, x="GOOGL", y="AMZN", ax=ax)

ax.set_title("Daily returns - GOOGL vs AMZN")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
ax.xaxis.set_major_formatter(mtick.PercentFormatter(1))

Scatter Plot Using `plotly.express`¶

In [35]:

df_wide["GOOGL"] = df_wide["GOOGL"]
df_wide["AMZN"] = df_wide["AMZN"]

fig = px.scatter(
    df_wide,
    x="GOOGL",
    y="AMZN",
    title="Daily returns - GOOGL vs. AMZN",
    width=600,
    height=600,
)
fig.update_layout(yaxis_tickformat="%", xaxis_tickformat="%")
fig.show()

Boxplot¶

In [36]:

url = "https://raw.githubusercontent.com/szrlee/Stock-Time-Series-Analysis/master/data/all_stocks_2006-01-01_to_2018-01-01.csv"
df = pd.read_csv(url)

stocks = ["AMZN", "GOOGL", "IBM", "JPM"]
df = df.loc[
    (df.Name.isin(stocks)) & (pd.to_datetime(df.Date).dt.year == 2016),
    ["Date", "Name", "Close", "Open"],
]
df["Return"] = (df["Close"] - df["Open"]) / df["Open"]
df["Date"] = pd.to_datetime(df.Date)

Boxplot Using `pandas`¶

In [37]:

df_wide = df.pivot(index="Date", columns="Name", values="Return")
ax = df_wide.boxplot(column=stocks)

ax.set_ylabel("Daily returns")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))

Boxplot Using `matplotlib`¶

In [38]:

df_wide = df.pivot(index="Date", columns="Name", values="Return")

fig, ax = plt.subplots(figsize=(12, 6))

ax.boxplot([df_wide[col] for col in stocks], vert=True, autorange=True, labels=stocks)

ax.set_ylabel("Daily returns")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))

Boxplot Using `seaborn`¶

In [39]:

ax = sns.boxplot(x="Name", y="Return", data=df, order=stocks)

ax.set_ylabel("Daily returns")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))

Boxplot Using `plotly.express`¶

In [40]:

fig = px.box(
    df,
    x="Name",
    y="Return",
    category_orders={"Name": stocks},
    width=900,
    height=500,
)
fig.show()

Python Data Viz Libraries Compared: 8 Popular Graphs Made with pandas, matplotlib, seaborn, and plotly.express¶

Set Up a Virtual Environment¶

Start Jupyter Notebook and Import Libraries¶

Understand the Data¶

Line Plot¶

Line Plot Using pandas¶

Line Plot Using matplotlib¶

Line Plot Using seaborn¶

Line Plot Using plotly.express¶

Grouped Bars Plot¶

Grouped Bars Using pandas¶

Grouped Bars Using matplotlib¶

Grouped Bars Using seaborn¶

Grouped Bars Using plotly.express¶

Stacked Bars Plot¶

Stacked Bars Using pandas¶

Stacked Bars Using matplotlib¶

Stacked Bars Using seaborn¶

Stacked Bars Using plotly.express¶

Area Chart¶

Area Chart Using pandas¶

Area Chart Using matplotlib¶

Area Chart Using plotly.express¶

Pie or Donut Chart¶

Pie/Donut Chart Using pandas¶

Pie/Donut Chart Using matplotlib¶

Pie/Donut Chart Using plotly.express¶

Histogram¶

Histogram Using pandas¶

Histogram Using matplotlib¶

Histogram Using seaborn¶

Histogram Using plotly.express¶

Scatter Plot¶

Scatter Plot Using pandas¶

Scatter Plot Using matplotlib¶

Scatter Plot Using seaborn¶

Scatter Plot Using plotly.express¶

Boxplot¶

Boxplot Using pandas¶

Boxplot Using matplotlib¶

Boxplot Using seaborn¶

Boxplot Using plotly.express¶

Line Plot Using `pandas`¶

Line Plot Using `matplotlib`¶

Line Plot Using `seaborn`¶

Line Plot Using `plotly.express`¶

Grouped Bars Using `pandas`¶

Grouped Bars Using `matplotlib`¶

Grouped Bars Using `seaborn`¶

Grouped Bars Using `plotly.express`¶

Stacked Bars Using `pandas`¶

Stacked Bars Using `matplotlib`¶

Stacked Bars Using `seaborn`¶

Stacked Bars Using `plotly.express`¶

Area Chart Using `pandas`¶

Area Chart Using `matplotlib`¶

Area Chart Using `plotly.express`¶

Pie/Donut Chart Using `pandas`¶

Pie/Donut Chart Using `matplotlib`¶

Pie/Donut Chart Using `plotly.express`¶

Histogram Using `pandas`¶

Histogram Using `matplotlib`¶

Histogram Using `seaborn`¶

Histogram Using `plotly.express`¶

Scatter Plot Using `pandas`¶

Scatter Plot Using `matplotlib`¶

Scatter Plot Using `seaborn`¶

Scatter Plot Using `plotly.express`¶

Boxplot Using `pandas`¶

Boxplot Using `matplotlib`¶

Boxplot Using `seaborn`¶

Boxplot Using `plotly.express`¶