This chapter introduces the Matplotlib visualization library and demonstrates how to use it with data. This is the Summary of lecture "Introduction to Data Visualization with Matplotlib", via datacamp.
import matplotlib.pyplot as plt
import pandas as pd
plt.rcParams['figure.figsize'] = (10, 5)
There are many ways to use Matplotlib. In this course, we will focus on the pyplot
interface, which provides the most flexibility in creating and customizing data visualizations.
Initially, we will use the pyplot
interface to create two kinds of objects: Figure
objects and Axes
objects.
# Create a Figure and an Axes with plt.subplots
fig, ax = plt.subplots()
Adding data to a figure is done by calling methods of the Axes
object. In this exercise, we will use the plot
method to add data about rainfall in two American cities: Seattle, WA and Austin, TX.
The data are stored in two Pandas DataFrame objects that are already loaded into memory: seattle_weather
stores information about the weather in Seattle, and austin_weather
stores information about the weather in Austin. Each of the data frames has a MONTHS
column that stores the three-letter name of the months. Each also has a column named "MLY-PRCP-NORMAL" that stores the average rainfall in each month during a ten-year period.
In this exercise, you will create a visualization that will allow you to compare the rainfall in these two cities.
seattle_weather = pd.read_csv('./dataset/seattle_weather.csv', index_col='DATE')
austin_weather = pd.read_csv('./dataset/austin_weather.csv', index_col='DATE')
# Create a Figure and an Axes with plt.subplots
fig, ax = plt.subplots()
# Plot MLY-PRCP-NORMAL from seattle_weather against the MONTH
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"]);
# Plot MLY-PRCP-NORMAL from austin_weather against MONTH
ax.plot(austin_weather['MONTH'], austin_weather["MLY-PRCP-NORMAL"]);
We can customize the appearance of data in our plots, while adding the data to the plot, using key-word arguments to the plot command.
In this exercise, you will customize the appearance of the markers, the linestyle that is used, and the color of the lines and markers for your data.
As before, the data is already provided in Pandas DataFrame objects loaded into memory: seattle_weather
and austin_weather
. These each have a MONTHS
column and a "MLY-PRCP-NORMAL" that you will plot against each other.
fig, ax = plt.subplots();
# Plot Seattle data, setting data appearance
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"], color='b', marker='o', linestyle='--');
# Plot Austin data, setting data appearance
ax.plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"], color='r', marker='v', linestyle='--');
Customizing the axis labels requires using the set_xlabel and set_ylabel methods of the Axes object. Adding a title uses the set_title method.
In this exercise, you will customize the content of the axis labels and add a title to a plot.
fig, ax = plt.subplots();
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"]);
ax.plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"]);
# Customize the x-axis label
ax.set_xlabel("Time (months)");
# Customize the y-axis label
ax.set_ylabel("Precipitation (inches)");
# Add the title
ax.set_title("Weather patterns in Austin and Seattle");
Small multiples are used to plot several datasets side-by-side. In Matplotlib, small multiples can be created using the plt.subplots()
function. The first argument is the number of rows in the array of Axes objects generate and the second argument is the number of columns. In this exercise, you will use the Austin and Seattle data to practice creating and populating an array of subplots.
The data is given to you in DataFrames: seattle_weather
and austin_weather
. These each have a MONTH
column and MLY-PRCP-NORMAL
(for average precipitation), as well as MLY-TAVG-NORMAL
(for average temperature) columns. In this exercise, you will plot in a separate subplot the monthly average precipitation and average temperatures in each city.
# Create a Figure and an array of subplots with 2 rows and 2 columns
fig, ax = plt.subplots(2,2);
# Addressing the top left Axes as index 0, 0, plot month and Seattle precipitation
ax[0, 0].plot(seattle_weather['MONTH'], seattle_weather['MLY-PRCP-NORMAL']);
# In the top right (index 0,1), plot month and Seattle temperatures
ax[0, 1].plot(seattle_weather['MONTH'], seattle_weather['MLY-TAVG-NORMAL']);
# In the bottom left (1, 0) plot month and Austin precipitations
ax[1, 0].plot(austin_weather['MONTH'], austin_weather['MLY-PRCP-NORMAL']);
# In the bottom right (1, 1) plot month and Austin temperatures
ax[1, 1].plot(austin_weather['MONTH'], austin_weather['MLY-TAVG-NORMAL']);
When creating small multiples, it is often preferable to make sure that the different plots are displayed with the same scale used on the y-axis. This can be configured by setting the sharey
key-word to True
.
In this exercise, you will create a Figure with two Axes objects that share their y-axis.
Note: you can also define sharey keyword with 'row' or 'column'
# Create a figure and an array of axes: 2 rows, 1 column with shared y axis
fig, ax = plt.subplots(2, 1, sharey=True);
# Plot Seattle precipitation data in the top axes
ax[0].plot(seattle_weather['MONTH'], seattle_weather['MLY-PRCP-NORMAL'], color = 'b');
ax[0].plot(seattle_weather['MONTH'], seattle_weather['MLY-PRCP-25PCTL'], color = 'b', linestyle = '--');
ax[0].plot(seattle_weather['MONTH'], seattle_weather['MLY-PRCP-75PCTL'], color = 'b', linestyle = '--');
# Plot Austin precipitation data in the bottom axes
ax[1].plot(austin_weather['MONTH'], austin_weather['MLY-PRCP-NORMAL'], color = 'r');
ax[1].plot(austin_weather['MONTH'], austin_weather['MLY-PRCP-25PCTL'], color = 'r', linestyle = '--');
ax[1].plot(austin_weather['MONTH'], austin_weather['MLY-PRCP-75PCTL'], color = 'r', linestyle = '--');