matplotlib
and iPython notebook¶UCSD Scientific Python User's Group, April 10th, 2013
Bad design = difficult interpretation, possible loss of information, and inability to recognize trends. I will use concepts from Visual Display of Quantitative Information, 2nd Ed, by Edward Tufte, Graphics Press (2001).
Do not do this bad example from the matplotlib
gallery:
Why is this so bad? The divergent 'rainbow' color scheme makes it difficult to compare. Humans are terrible at using different hues to discriminate between different values, but alright at using saturation, such as one color from very light to very dark.
Or this also terrible example from the gallery:
Why is this so bad? The graphics of the box distract from the true information. It would be much more effective as a plain bar chart.
We will talk about how to
# For setting parameters, we will need to use matplotlib (mpl) directly
import matplotlib as mpl
# This is the usual invocation of pyplot
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Set the random seed for consistency
np.random.seed(12)
# I happen to know that there are 7 default colors in matplotlib
for i in range(7):
plt.plot(np.random.randn(1000).cumsum())
Ugh. It's an unfortunate mishmash of RGB+CYMK: Red, blue, green, and cyan, yellow, magenta and blac(k). But we already know that we can do better.
In 2003, Cynthia Brewer and colleagues released guidelines for coloring maps with sequential, divergent, and qualitative colors, and these guidelines are now available through http://colorbrewer2.org/. These colors are included in an existing package in R
, but only recently someone added these colors to Python through the package brewer2mpl
, intended as being used in matplotlib
.
An example import is, (from the author's blog post):
import brewer2mpl
bmap = brewer2mpl.get_map('Set1', 'qualitative', 5)
colors = bmap.mpl_colors
So let's install this package.
! sudo easy_install brewer2mpl
(can't do interactive terminal stuff in iPython so I did this in my actual terminal)
The output:
! cat ~/.matplotlibrc | grep color_cycle
import brewer2mpl
# brewer2mpl.get_map args: set name set type number of colors
bmap = brewer2mpl.get_map('Set2', 'qualitative', 7)
colors = bmap.mpl_colors
print colors
We have a list of 3-tuples of RGB decimal values, from 0 to 1, as specified in the matplotlib
colors
API. You may be used to seeing RGB specifications in values between 0 and 255, and this is the same thing, except it's a fraction of 255.
Now let's use these colors to plot. To do so, we'll have to change the default color cycle of matplotlib via the command,
mpl.rcParams['axes.color_cycle'] = colors
Now that mpl
we imported earlier is coming in handy!
# Set the random seed for consistency
np.random.seed(12)
# Change the default colors
mpl.rcParams['axes.color_cycle'] = colors
# I happen to know that there are 7 default colors in matplotlib
for i in range(7):
plt.plot(np.random.randn(1000).cumsum())
Now that looks much better! Here is a cheat sheet of the ColorBrewer colors (from the cbrewer page on Mathworks website)
As for scatterplots, I prefer to show them with a very thin, grey line around the circle. So instead of no outlines like this:
# Set the random seed for consistency
np.random.seed(12)
# Change the default colors
#mpl.rcParams['axes.color_cycle'] =
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors
#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap
# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
plt.scatter(np.random.randn(1000), np.random.randn(1000),
color=color)
Or an overpowering black outline that speaks louder than the plot itself,
# Set the random seed for consistency
np.random.seed(12)
# Change the default colors
#mpl.rcParams['axes.color_cycle'] =
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors
#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap
# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
plt.scatter(np.random.randn(1000), np.random.randn(1000),
color=color, edgecolors='k')
A light grey, thin outline balances both visibility and aesthetics.
# Set the random seed for consistency
np.random.seed(12)
# Change the default colors
#mpl.rcParams['axes.color_cycle'] =
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors
#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap
# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
plt.scatter(np.random.randn(1000), np.random.randn(1000),
color=color,
edgecolors='grey',linewidths=0.1)
Now to introduce 'Set2' as our default colors, we must change our .matplotlibrc
file.
Let's check where ours is.
# For some reason, this doesn't work with mpl
import matplotlib
matplotlib.matplotlib_fname()
According to the matplotlib
customization information, the order in which the matplotlibrc
files are looked at:
matplotlibrc
in the current working directory, usually used for specific customizations that you do not want to apply elsewhere..matplotlib/matplotlibrc
, for the user’s default customizations. See .matplotlib
directory location.INSTALL/matplotlib/mpl-data/matplotlibrc
, where INSTALL
is something like /usr/lib/python2.5/site-packages
on Linux, and maybe C:\Python25\Lib\site-packages
on Windows. Every time you install matplotlib, this file will be overwritten, so if you want your customizations to be saved, please move this file to your .matplotlib
directory.So that we can distinguish our custom matplotlibrc
file, we'll make the ~/.matplotlib
directory and the matplotlibrc
file within it. If you haven't created this directory and the file already, you will need to instantiate one.
We will use a sample .matplotlibrc
file is available from the matplotlib
website.
%%bash
mkdir ~/.matplotlib
cd ~/.matplotlib
wget http://matplotlib.org/_static/matplotlibrc
cat ~/.matplotlib/matplotlibrc
You'll need to edit the ~/.matplotlib/matplotlibrc
file in a text editor on your own machine to change the colors. However, we can't just use that vector we created earlier, because we must use HEX colors. We can use the mpl.colors.rgb2hex
function to convert the 3-tuples to HEX strings.
for color in colors:
print mpl.colors.rgb2hex(color)
Before I edit the file, let's see what the file looks like on the line we're going to edit, where it says axes.color_cycle
,
! cat ~/.matplotlib/matplotlibrc | grep axes.color_cycle
I edited the ~/.matplotlib/matplotlibrc
file separately in a text editor.
! cat ~/.matplotlib/matplotlibrc | grep axes.color_cycle
Now, in future instances (after we restart python and reload matplotlib) when we reset the color cycle to the defaults, we should get the correct 'Set2' colorbrewer colors. For now, we'll use the change we made to mpl.rcParams
and to keep the colors the way they are.
Let's use the same principles as before to improve this heatmap:
from matplotlib.colors import LogNorm
from pylab import *
#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5
hist2d(x, y, bins=40, norm=LogNorm())
colorbar()
show()
What's so bad about this? Well, it's using a rainbow of colors to indicate a single scale - increasing from zero. Let's use one of the sequential colorbrewer palettes to improve this. I like green, so let's use that. We will tell brewer2mpl
to give us a matplotlib
-compatible colormap with the attribute .mpl_colormap
, with the full call being,
brewer2mpl.get_map('Greens', 'sequential', 8).mpl_colormap
from matplotlib.colors import LogNorm
from pylab import *
#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5
hist2d(x, y, bins=40, norm=LogNorm(),
cmap=brewer2mpl.get_map('Greens', 'sequential', 8).mpl_colormap)
colorbar()
show()
This is much easier to interpret, since we only have to distinguish an increase in saturation of the hue green, rather than be forced to think about multiple different hues and how their colors represent an increase in value.
Though if you just have increases from 0 to larger numbers, it may be even simpler (and better) to just use grey. Maybe not as pretty, but very easy to interpret.
from matplotlib.colors import LogNorm
from pylab import *
#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5
# norm=LogNorm() tells the function to use a logscale for the z-values
hist2d(x, y, bins=40, norm=LogNorm(),
cmap=brewer2mpl.get_map('Greys', 'sequential', 8).mpl_colormap)
colorbar()
show()
But what if your data has positive and negative values? Then you want to use a divergent color map. I like blue-red (RdBu
in reverse with these colormaps) because it has the natural interpretation of blue=cold, negative, and red=hot, positive.
The below example is from griddata_demo.py in the matplotlib
gallery.
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.rainbow,
vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
We'll improve on this example with a more natural, divergent colormap.
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
# ---- This is the line we changed ---- #
CS = plt.contourf(xi,yi,zi,15,
cmap=brewer2mpl.get_map('RdBu', 'diverging', 8, reverse=True).mpl_colormap,
vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
We can do other colormaps just for fun, too. What does purple and green look like?
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
# ---- This is the line we changed ---- #
CS = plt.contourf(xi,yi,zi,15,
cmap=brewer2mpl.get_map('PRGn', 'diverging', 8, reverse=True).mpl_colormap,
vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
The default font shipped with matplotlib
is Bitsream Vera Sans, and it's not that pretty. I much prefer Helvetica, and I wrote a tutorial on how to set Helvetica as the default sans-serif font in matplotlib
. It was originally wrote for Mac OSX users, but the concepts can be used on any system. The basic idea is that you need to either obtain a set of Helvetica*.tff
files, or extract them from Mac OS X's Helvetica.dfont
file. Unfortuantely, it's fairly involved, and I will leave the reader to follow the link and use the tutorial.
Here are the before and after plots. Before:
After:
Much nicer! Unfortunately, I performed this change on my old computer and didn't have time to change the defaults on this one, so we will have to suffer through Bitstream Vera Sans together.
'Chartjunk' is a term coined by Edward Tufte to describe any uninformative aspects of a graph. You can also think about the 'data-ink ratio' with the question, How is this patch of ink contributing to the interpretation of these data?
For example, this bar graph has an extraordinarily low 'data-ink ratio', and this unfortunate example is also from the matplotlib
gallery.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
class RibbonBox(object):
original_image = read_png(get_sample_data("Minduka_Present_Blue_Pack.png",
asfileobj=False))
cut_location = 70
b_and_h = original_image[:,:,2]
color = original_image[:,:,2] - original_image[:,:,0]
alpha = original_image[:,:,3]
nx = original_image.shape[1]
def __init__(self, color):
rgb = matplotlib.colors.colorConverter.to_rgb(color)
im = np.empty(self.original_image.shape,
self.original_image.dtype)
im[:,:,:3] = self.b_and_h[:,:,np.newaxis]
im[:,:,:3] -= self.color[:,:,np.newaxis]*(1.-np.array(rgb))
im[:,:,3] = self.alpha
self.im = im
def get_stretched_image(self, stretch_factor):
stretch_factor = max(stretch_factor, 1)
ny, nx, nch = self.im.shape
ny2 = int(ny*stretch_factor)
stretched_image = np.empty((ny2, nx, nch),
self.im.dtype)
cut = self.im[self.cut_location,:,:]
stretched_image[:,:,:] = cut
stretched_image[:self.cut_location,:,:] = \
self.im[:self.cut_location,:,:]
stretched_image[-(ny-self.cut_location):,:,:] = \
self.im[-(ny-self.cut_location):,:,:]
self._cached_im = stretched_image
return stretched_image
class RibbonBoxImage(BboxImage):
zorder = 1
def __init__(self, bbox, color,
cmap = None,
norm = None,
interpolation=None,
origin=None,
filternorm=1,
filterrad=4.0,
resample = False,
**kwargs
):
BboxImage.__init__(self, bbox,
cmap = cmap,
norm = norm,
interpolation=interpolation,
origin=origin,
filternorm=filternorm,
filterrad=filterrad,
resample = resample,
**kwargs
)
self._ribbonbox = RibbonBox(color)
self._cached_ny = None
def draw(self, renderer, *args, **kwargs):
bbox = self.get_window_extent(renderer)
stretch_factor = bbox.height / bbox.width
ny = int(stretch_factor*self._ribbonbox.nx)
if self._cached_ny != ny:
arr = self._ribbonbox.get_stretched_image(stretch_factor)
self.set_array(arr)
self._cached_ny = ny
BboxImage.draw(self, renderer, *args, **kwargs)
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = [(0.8, 0.2, 0.2),
(0.2, 0.8, 0.2),
(0.2, 0.2, 0.8),
(0.7, 0.5, 0.8),
(0.3, 0.8, 0.7),
]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
bbox = TransformedBbox(bbox0, ax.transData)
rb_patch = RibbonBoxImage(bbox, bc, interpolation="bicubic")
ax.add_artist(rb_patch)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h), va="bottom", ha="center")
patch_gradient = BboxImage(ax.bbox,
interpolation="bicubic",
zorder=0.1,
)
gradient = np.zeros((2, 2, 4), dtype=np.float)
gradient[:,:,:3] = [1, 1, 0.]
gradient[:,:,3] = [[0.1, 0.3],[0.3, 0.5]] # alpha channel
patch_gradient.set_array(gradient)
ax.add_artist(patch_gradient)
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box.png')
plt.show()
Why is this so bad? We have these superfluous present boxes to represent five numbers. However, one thing that this figure does correctly is put the value the bar graph represents just above the bar. First, let's get rid of this silly and uninformative gradient by commenting it out.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
class RibbonBox(object):
original_image = read_png(get_sample_data("Minduka_Present_Blue_Pack.png",
asfileobj=False))
cut_location = 70
b_and_h = original_image[:,:,2]
color = original_image[:,:,2] - original_image[:,:,0]
alpha = original_image[:,:,3]
nx = original_image.shape[1]
def __init__(self, color):
rgb = matplotlib.colors.colorConverter.to_rgb(color)
im = np.empty(self.original_image.shape,
self.original_image.dtype)
im[:,:,:3] = self.b_and_h[:,:,np.newaxis]
im[:,:,:3] -= self.color[:,:,np.newaxis]*(1.-np.array(rgb))
im[:,:,3] = self.alpha
self.im = im
def get_stretched_image(self, stretch_factor):
stretch_factor = max(stretch_factor, 1)
ny, nx, nch = self.im.shape
ny2 = int(ny*stretch_factor)
stretched_image = np.empty((ny2, nx, nch),
self.im.dtype)
cut = self.im[self.cut_location,:,:]
stretched_image[:,:,:] = cut
stretched_image[:self.cut_location,:,:] = \
self.im[:self.cut_location,:,:]
stretched_image[-(ny-self.cut_location):,:,:] = \
self.im[-(ny-self.cut_location):,:,:]
self._cached_im = stretched_image
return stretched_image
class RibbonBoxImage(BboxImage):
zorder = 1
def __init__(self, bbox, color,
cmap = None,
norm = None,
interpolation=None,
origin=None,
filternorm=1,
filterrad=4.0,
resample = False,
**kwargs
):
BboxImage.__init__(self, bbox,
cmap = cmap,
norm = norm,
interpolation=interpolation,
origin=origin,
filternorm=filternorm,
filterrad=filterrad,
resample = resample,
**kwargs
)
self._ribbonbox = RibbonBox(color)
self._cached_ny = None
def draw(self, renderer, *args, **kwargs):
bbox = self.get_window_extent(renderer)
stretch_factor = bbox.height / bbox.width
ny = int(stretch_factor*self._ribbonbox.nx)
if self._cached_ny != ny:
arr = self._ribbonbox.get_stretched_image(stretch_factor)
self.set_array(arr)
self._cached_ny = ny
BboxImage.draw(self, renderer, *args, **kwargs)
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = [(0.8, 0.2, 0.2),
(0.2, 0.8, 0.2),
(0.2, 0.2, 0.8),
(0.7, 0.5, 0.8),
(0.3, 0.8, 0.7),
]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
bbox = TransformedBbox(bbox0, ax.transData)
rb_patch = RibbonBoxImage(bbox, bc, interpolation="bicubic")
ax.add_artist(rb_patch)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h), va="bottom", ha="center")
# patch_gradient = BboxImage(ax.bbox,
# interpolation="bicubic",
# zorder=0.1,
# )
# gradient = np.zeros((2, 2, 4), dtype=np.float)
# gradient[:,:,:3] = [1, 1, 0.]
# gradient[:,:,3] = [[0.1, 0.3],[0.3, 0.5]] # alpha channel
# patch_gradient.set_array(gradient)
# ax.add_artist(patch_gradient)
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box.png')
plt.show()
That was easy, we just removed the call to the gradient. Next, let's get rid of these boxes and replace them with simple bars. I'm going to cut out the gradient and the box code, and add the line,
ax.bar(year, h, color=bc)
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = [(0.8, 0.2, 0.2),
(0.2, 0.8, 0.2),
(0.2, 0.2, 0.8),
(0.7, 0.5, 0.8),
(0.3, 0.8, 0.7),
]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
# bbox = TransformedBbox(bbox0, ax.transData)
# rb_patch = BboxImage(bbox, interpolation='bicubic')
# rb_ptch = RibbonBoxImage(bbox, bc, interpolation="bicubic")
# ax.add_artist(rb_patch)
# ax.add_artist(bbox)
# --- this is the line we changed --- #
ax.bar(year, h, color=bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
But this is offset to the right. Let's move it to the left using year-0.04
as the previous graph. Also lets change from these hideous colors to 'Set1', another qualitative colorbrewer scheme.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
# box_colors = [(0.8, 0.2, 0.2),
# (0.2, 0.8, 0.2),
# (0.2, 0.2, 0.8),
# (0.7, 0.5, 0.8),
# (0.3, 0.8, 0.7),
# ]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
Let's move the number up a little.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
# box_colors = [(0.8, 0.2, 0.2),
# (0.2, 0.8, 0.2),
# (0.2, 0.2, 0.8),
# (0.7, 0.5, 0.8),
# (0.3, 0.8, 0.7),
# ]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
# box_colors = [(0.8, 0.2, 0.2),
# (0.2, 0.8, 0.2),
# (0.2, 0.2, 0.8),
# (0.7, 0.5, 0.8),
# (0.3, 0.8, 0.7),
# ]
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h+100), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
Let's think some more about this data-ink ratio. What do the right and top axes really tell us? They just make a box around the plot. It looks much cleaner without them. We'll remove them with,
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
# --- changed this line --- #
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h+100), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
# --- Added this line --- #
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
Well that removed the axis, but the ticks remain. We'll remove them with
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
# --- changed this line --- #
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h+100), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
# --- Added this line --- #
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# --- Added this line --- #
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
Even better, let's remove the left axis and replace it with a white overlapping grid. This way, the reader doesn't have to move their eye back and forth to the left axis and back to see what value corresponds to what height. We will aslo remove the ticks on the x-axis, since the year name labels the position, and we don't need a tick there.
ax.spines['left'].set_visible(False)
...
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
...
ax.grid(axis = 'y', color ='white', linestyle='-')
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
# --- changed this line --- #
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color =bc)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h+100), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
# --- Added this line --- #
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
# --- Added this line --- #
ax.yaxis.set_ticks_position('none')
ax.xaxis.set_ticks_position('none')
ax.grid(axis = 'y', color ='white', linestyle='-')
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
It would look even nicer without the black lines around the bars. We will adjust the ax.bar
line to set linewidth=0
,
ax.bar(year-0.4, h, color=bc, linewidth=0)
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage
from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data
if 1:
from matplotlib.transforms import Bbox, TransformedBbox
from matplotlib.ticker import ScalarFormatter
fig = plt.gcf()
fig.clf()
ax = plt.subplot(111)
years = np.arange(2004, 2009)
# --- changed this line --- #
box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
heights = np.random.random(years.shape) * 7000 + 3000
fmt = ScalarFormatter(useOffset=False)
ax.xaxis.set_major_formatter(fmt)
for year, h, bc in zip(years, heights, box_colors):
# --- this is the line we changed --- #
ax.bar(year-0.4, h, color=bc, linewidth=0)
ax.annotate(r"%d" % (int(h/100.)*100),
(year, h+100), va="bottom", ha="center")
ax.set_xlim(years[0]-0.5, years[-1]+0.5)
ax.set_ylim(0, 10000)
# --- Added this line --- #
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
# --- Added this line --- #
ax.yaxis.set_ticks_position('none')
ax.xaxis.set_ticks_position('none')
ax.grid(axis = 'y', color ='white', linestyle='-')
fig.savefig('ribbon_box_no_ribbons.png')
plt.show()
So now we have a very nice looking bar graph! All we did was keep 'erasing' chart items that weren't informative. You can use these concepts in your own graphs.
So far we've talked about things you can do with the existing matplotlib
package. Now we'll talk about packages that implement other design principles.
Recently Tufte has introduced the idea of 'Sparklines', or a 'data-word', is an intense, word-sized graphic. The following examples use sparkplot and its introductory blog post. For example, if you visualize the wins (red, up) and losses (blue, down) by the Lakers' 2002 season where they won the NBA championships, it is easy to see streaks of wins and losses, . It is also easy to compare to their 2005 performance, where they did not win the championship,
. This is a very nice way to visualize binary data.
Additionally, sparklines can be used to visualize a series of information. For example, this shows the number of messages sent on the message list comp.lang.py
in 1994, , and you see that the minimum is zero and the maximum is 518. Compare this to the messages sent in 2004,
.
But you may not just be interested in the min and max, but maybe in deviations from the norm. The southern oscillation is a good indicator of El Nino, and values less than -1 usually define an El Nino weather pattern, [data: Tahiti, 1955-1992]
If you have some series data or binary data you'd like to incorporate into a sentence, Sparklines are great.
To change your default fonts in iPython notebook, you will need to create a custom profile and create a custom CSS file, which is described thorougly in this tutorial. If you like what you see in my iPython notebook, which includes Consolas
as the default code font, approximately 80-character column width, and centered cells, you may use my custom.css
file:
# Find where my iPython directory is
! ipython locate
# Show the contents of my custom.css file, which I created using the above tutorial
! cat /Users/olga/.ipython/profile_customcss/static/css/custom.css
Bokeh (photography term for the aesthetic quality of a blurred background which focuses attention on the foreground, definition from the Bokeh Github readme) is a new package (started in March 2012, compared to matplotlib
which started in 2002) which aims to have beautiful, interactive visualizations within the iPython framework. It uses the powerful Data Driven Documents (d3) javascript library to render lovely vector-based graphics using the HTML5 canvas in the browser.
I downloaded the package but couldn't get the examples to work, so I will show you the example notebook they provided. It will definitely be a package to watch! The underlying data structures in Bokeh are pandas
DataFrame
s, so you can expect further integration with it and iPython in the future.
from bokeh.mpl import PlotClient
p = PlotClient(username='defaultuser', serverloc="http://portcon:5006",userapikey="nokey")
p.use_doc('example')
p.notebooksources()