matplotlib: Easy as X-Y-Z¶
When I have continuous data in three dimensions, my first visaulization inclination is to generate a contour plot. While 3-D surface plots might be useful in some special cases, in general I think they should be avoided since they add a great deal of complexity to a visualization without adding much (if any) information beyond a 2-D contour plot.
While I usually use R/ggplot2 to generate my data visualizations, I found the support for good-looking, out-of-the-box contour plots to be a bit lacking. Of course, you can make anything look great with enough effort, but you can also waste an excessive amount of time fiddling with customizable tools.
This isn't to say the Pythonic contour plot doesn't come with its own set of frustrations, but hopefully this post will make the task easier for any of you going down this road.
The most difficult part of using the Python/
matplotlib implementation of contour plots is formatting your data. The main plotting function we'll use,
ax.contour(X,Y,Z), requires that your three-dimensional input data be in an odd and unintuitive structure. In this post, I'll give you the code to get from a more traditional data structure to this Python-specific format.
To begin, I'll start with some dummy data that is in a standard "long" format, where each row corresponds to a single observation. In this case, my three dimensions are just
z which maps directly to the axes on which we wish to plot them.
import pandas as pd import numpy as np data_url = 'https://raw.githubusercontent.com/alexmill/website_notebooks/master/data/data_3d_contour.csv' contour_data = pd.read_csv(data_url) contour_data.head()
Nota bene: For best results, make sure that there is a row for every combination of
y coordinates in the plane of the range you want to plot. (Said differently, if $X$ is the set of points you want to plot on the $x$-axis and $Y$ is the set of points you want to plot on the $y$-axis, then your dataframe should contain a $z$-value for every point in the Cartesian product of $X \times Y$.) If you know you're going to be making a contour plot, you can plat ahead of time so your data-generating process results in this format. It's not detrimental if your data don't meet this requirement, but you may get unwanted blank spots in your plot if your data is missing any points in the plane.
Assuming your data are in a similar format, you can quickly convert it to the requisite structure for
matplotlib using the code below.
import numpy as np Z = contour_data.pivot_table(index='x', columns='y', values='z').T.values X_unique = np.sort(contour_data.x.unique()) Y_unique = np.sort(contour_data.y.unique()) X, Y = np.meshgrid(X_unique, Y_unique)
What's going on here? Looking at the
Z data first, I've merely used the
pivot_table method from
pandas to cast my data into a matrix format, where the columns/rows correspond to the values of
Z for each of the points in the range of the $x$/$y$-axes. We can see the resulting data structure below:
This by itself is not terribly unintuitive, but the odd part about
contour method is that it also requires your
Y data to have the exact same shape as your
Z data. This means that we need to duplicate our $x$ and $y$ values along different axes, so that each entry in
Z has its corresponding $x$ and $y$ coordinates in the same entry of the
Y matrices. Fortunately, the
meshgrid method from
numpy will do this automatically for us.
To help you visualize exacctly what
meshgrid is doing, first notice the unique values in each of my x/y axes:
(array([0. , 0.19897 , 0.349485, 0.5 , 0.69897 , 0.849485, 1. ]), array([0. , 0.26315789, 0.52631579, 0.63157895, 0.73684211, 0.84210526, 0.94736842, 1. ]))
And now let's display the matrices
Y generated by
I'm not a huge fan of this formatting requirement since we have to duplicate a bunch of data, but hopefully I've helped you understand the basic process required to get here from a more standard "long" data format.
matplotlib's default contour plot¶
Now that my data is in the correct format for
matplotlib to understand, I can generate my first pass at a contour plot:
from IPython.display import set_matplotlib_formats %matplotlib inline set_matplotlib_formats('svg') import matplotlib.pyplot as plt from matplotlib import rcParams # Initialize plot objects rcParams['figure.figsize'] = 5, 5 # sets plot size fig = plt.figure() ax = fig.add_subplot(111) # Generate a contour plot cp = ax.contour(X, Y, Z)
As you can see, there's nothing too impressive about the default look of this plot. However, with just a few extra lines of code, we can significantly improve the aesthetics of this base visualization.
My primary customizations will be:
Here, I'll use
colormap module to generate a color pallette (check out this handy reference for a full list of
matplotlib's default color pallettes).
# Initialize plot objects rcParams['figure.figsize'] = 5, 5 # sets plot size fig = plt.figure() ax = fig.add_subplot(111) # Define levels in z-axis where we want lines to appear levels = np.array([-0.4,-0.2,0,0.2,0.4]) # Generate a color mapping of the levels we've specified import matplotlib.cm as cm # matplotlib's color map library cpf = ax.contourf(X,Y,Z, len(levels), cmap=cm.Reds) # Set all level lines to black line_colors = ['black' for l in cpf.levels] # Make plot and customize axes cp = ax.contour(X, Y, Z, levels=levels, colors=line_colors) ax.clabel(cp, fontsize=10, colors=line_colors) plt.xticks([0,0.5,1]) plt.yticks([0,0.5,1]) ax.set_xlabel('X-axis') _ = ax.set_ylabel('Y-axis') #plt.savefig('figure.pdf') # uncomment to save vector/high-res version
You can obviously experiment with a lot more here, but this is already a significant improvement. I especially like how easy it is to plot the value of each level directly on the line. This obviates the need for a separate legend for the $z$-axis---just make sure you have a good title so people know what the z-axis represents!
Just for good measure, I'll show you what I was able to come up with exerting a similar amount of effort in R using just
ggplot2. (If you are determined to use R, I'd suggest checking out the
metR package which I found has better support for good-looking contour plots.)
%%R -i contour_data,levels -w 5 -h 4 -u in -r 200 library(ggplot2) plt = ggplot(contour_data, aes(x=x, y=y, z=z)) + stat_contour(aes(color=..level..), breaks=levels, size=1) + scale_x_continuous(name='X-axis', limits=c(0,1.01), breaks=c(0,0.5,1)) + scale_y_continuous(name='Y-axis', limits=c(0,1), breaks=c(0,0.5,1)) + scale_colour_gradient(low='#2E38B5', high='#B54648') + theme_bw() print(plt)