At this point you should know the basics of making plots with matplotlib
. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this section, we will show how to visualize data using pandas
and matplotlib
and create multi-panel plots such as the one below.
Figure 4.10. An example of seasonal temperatures for 2012-2013 using pandas and Matplotlib.
We will start again by reading in the data file.
import pandas as pd
import matplotlib.pyplot as plt
fp = "/home/jovyan/shared/L7/029740.txt"
data = pd.read_csv(
fp,
sep=r"\s+",
na_values=["*", "**", "***", "****", "*****", "******"],
usecols=["YR--MODAHRMN", "TEMP", "MAX", "MIN"],
parse_dates=["YR--MODAHRMN"],
index_col="YR--MODAHRMN",
)
After reading the file, we can new rename the TEMP
column as TEMP_F
, since we will later convert our temperatures from Fahrenheit to Celsius.
new_names = {"TEMP": "TEMP_F"}
data = data.rename(columns=new_names)
At this point we can quickly check the first rows of data to see whether the expected changes have occurred.
Next, we have to deal with no-data values. Let's start by checking how many no-data values we have.
So, there are 3579 missing values in the TEMP_F
column and we should remove those. We need not worry about the no-data values in MAX
and MIN
columns since we will not use them for the plots produced below. We can remove rows from our DataFrame
where TEMP_F
is missing values using the .dropna()
method.
How many rows of data would remain if we removed all rows with any no-data values from our data (including no-data values in the MAX
and MIN
columns)? If you test this, be sure to save the modified DataFrame
with another variable name or do not use the inplace
parameter.
# Use this cell to enter your solution.
Now that we have loaded the data, we can convert the temperature values from Fahrenheit to Celsius, like we have in earlier chapters.
We can once again now check the contents of our DataFrame
.
Having processed and cleaned the data we can now continue working with it and learn how to create figures that contain {term}subplots
. Subplots are used to display multiple plots in different panels of the same figure, as shown at the start of this section (Figure 4.10).
We can start with creating the subplots by dividing the data in the data file into different groups. In this case we can divide the temperature data into temperatures for the four different seasons of the year. We can make the following selections:
winter =
winter_temps = winter["TEMP_C"]
spring =
spring_temps = spring["TEMP_C"]
summer =
summer_temps = summer["TEMP_C"]
autumn =
autumn_temps = autumn["TEMP_C"]
Let's have a look at the data from two different seasons to see whether the preceding step appears to have worked as expected.
Figure 4.11. Winter temperatures for 2012-2013.
Figure 4.12. Summer temperatures for 2012-2013.
Based on the plots above it looks that the correct seasons have been plotted and the temperatures between winter and summer are quite different, as we would expect. One thing we might need to consider with this is that the y-axis range currently varies between the two plots and we may want to define axis ranges that ensure the data are plotted with the same y-axis ranges in all subplots. This will help make it easier to visually compare the temperatures between seasons.
In order to define y-axis limits that will include the data from all of the seasons and be consistent between subplots we first need to find the minimum and maximum temperatures from all of the seasons. In addition, we should consider that it would be beneficial to have some extra space (padding) between the y-axis limits and those values, such that, for example, the maximum y-axis limit is five degrees higher than the maximum temperature and the minimum y-axis limit is five degrees lower than the minimum temperature. We can do that below by calculating the minimum of each seasons minimum temperature and subtracting five degrees.
# Find lower limit for y-axis
# Find upper limit for y-axis
# Print y-axis min, max
print(f"Minimum temperature: {min_temp}")
print(f"Maximum temperature: {max_temp}")
We can now use this temperature range to standardize the y-axis ranges of our plots.
With the data split into seasons and y-axis range defined we can now continue to plot data from all four seasons the same figure. We will start by creating a figure containing four subplots in a two by two panel using the .subplots()
function from matplotlib
. In the .subplots()
function, the user can specify how many rows and columns of plots they want to have in their figure.
We can also specify the size of our figure with the figsize
parameter that takes the width
and height
values (in inches) as input.
Figure 4.13. Empty figure template with a 2x2 subplot panel.
We can see from the output of the code cell that we now have a list containing two nested lists, where the first nested list contains the axes for column 1 and 2 of row 1 and the second contains the axes for columns 1 and 2 of row 2.
To make it easier to keep track of things, we can parse these axes into their own variables as follows.
Now we have four different axis variables for the different panels in our figure. Next we can use these axes to plot the seasonal temperature data. We can start by plotting the data for the different seasons with different colors for each of the lines, and we can specify the y-axis limits to be the same for all of the subplots.
c
parameter to change the color of the line. You can define colors using RBG color codes, but it is often easier to use one of the matplotlib
named colors [^matplotlib_colors].lw
parameter.ylim
parameter can be used to define the y-axis limits.Putting all of this together in a single code cell we have the following:
# Create the figure and subplot axes
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
# Define variables to more easily refer to individual axes
ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]
# Set plot line width
line_width = 1.5
# Plot data
# Display the plot
# Note: This is not required, but suppresses text from being printed
# in the output cell
plt.show()
Figure 4.14. Seasonal temperatures for 2012-2013 plotted in a 2x2 panel.
Great, now we have all the plots in same figure! However, we can see that there are some problems with our x-axis labels and a few other missing plot items we should add.
Let's recreate the plot and make some improvements. In this version of the plot we will:
xlabel
and ylabel
parameters in the .plot()
function.grid=True
parameter for the .plot()
function.fig.suptitle()
function.plt.setp()
function..text()
function.# Create the figure and subplot axes
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
# Define variables to more easily refer to individual axes
ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]
# Set plot line width
line_width = 1.5
# Plot data
# Winter plot
spring_temps.plot(
ax=ax12, c="orange", lw=line_width, ylim=[min_temp, max_temp], grid=True
)
summer_temps.plot(
ax=ax21,
c="green",
lw=line_width,
ylim=[min_temp, max_temp],
xlabel="Date",
ylabel="Temperature [°C]",
grid=True,
)
autumn_temps.plot(
ax=ax22,
c="brown",
lw=line_width,
ylim=[min_temp, max_temp],
xlabel="Date",
grid=True,
)
# Set figure title
# Rotate the x-axis labels so they don't overlap
plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)
# Season label text
ax12.text(pd.to_datetime("20130515"), -25, "Spring")
ax21.text(pd.to_datetime("20130815"), -25, "Summer")
ax22.text(pd.to_datetime("20131115"), -25, "Autumn")
# Display the figure
plt.show()
Figure 4.15. Seasonal temperatures for 2012-2013 plotted with season names and gridlines visible.
The new version of the figure essentially conveys the same information as the first version, but the additional plot items help to make it easier to see the plot values and immediately understand the data being presented. Not bad.
Visualize only the winter and summer temperatures in a one by two panel figure. Save the resulting figure as a .png file.
# Use this cell to enter your solution.