Matplotlib Tutorial

In this post, it will cover the basic usage of matplotlib (especially on pyplot) in various ways. This post is a summary of supplement lecture note in "Probability and Statistics in Data Science using Python", offered from UCSD DSE210x

  • toc: true
  • badges: true
  • comments: true
  • author: Chanseok Kang
  • categories: [Python, edX, Visualization]
  • image: images/multiple_axis_ex.png
In [3]:
import matplotlib.pyplot as plt
import numpy as np

Tutorial

This notebook will show you how to draw basic/advanced plots using matplotlib. Please refer to the offical page for the details.

Line Plot

In this secion, we will plot $y = \cos(x)$.

In [7]:
# Data
X = np.arange(0, 4 * np.pi, 0.1)
y = np.cos(X)

plt.figure(figsize=(10, 8));
plt.plot(X, y);

# Text also accept LaTeX syntax
plt.xlabel('$x$');
plt.ylabel('$y$');

Note: In the cell, there is a log while execute pyplot related plot. To avoid this, simply add semicolon at the end of each pyplot APIs.

Legends, Linestyles, Colors and Markers

In this section, we will plot exponential function and square function ($y = 2^x, \quad y=x^2$)

In [9]:
# Data
X = np.arange(0, 10, 1)
y_1 = 2 ** X
y_2 = X ** 2

plt.figure(figsize=(10, 8));
# Specify color, linestyle and marker using keyword arguments
plt.plot(X, y_1, label='$2^x$', color='g', linestyle='--', marker='s');
plt.plot(X, y_2, label='$x^2$', color='r', linestyle='-', marker='o');
plt.xlabel('$x$');
plt.ylabel('$y$');
plt.legend(loc='best');

We can also draw this with positional arguments.

In [10]:
plt.figure(figsize=(10, 8));
# Specify color, linestyle and marker using positional arguments
plt.plot(X, y_1, 'g--s', label='$2^x$');
plt.plot(X, y_2, 'r-o', label='$x^2$');
plt.xlabel('$x$');
plt.ylabel('$y$');
plt.legend(loc='best');

Note: In python, positional arguments must be in front of the keyword arguments

Title and Font Size

In this section, we will change the title and fontsize with plt.rc

In [13]:
plt.rc('font', size=10)          # controls default text sizes
plt.rc('axes', titlesize=10)     # fontsize of the axes title
plt.rc('axes', labelsize=12)     # fontsize of the x and y labels
plt.rc('xtick', labelsize=10)    # fontsize of the tick labels
plt.rc('ytick', labelsize=10)    # fontsize of the tick labels
plt.rc('legend', fontsize=15)    # legend fontsize

x = np.arange(-10, 10, 0.1)
y = x ** 3

plt.figure(figsize=(10, 8));
plt.plot(x, y, label = '$x^3$');
plt.xlabel('$x$', fontsize = 12); # The fontsize can be set here as well
plt.ylabel('$y$', fontsize = 12);
plt.title('$y = x^3$', fontsize = 16); # Set title and its fontsize
plt.legend(loc = 'upper left');

# Add grid
plt.grid();

Subplots

When you want compare two or more plot simultaneously, Subplot will be the best choice.

In [15]:
# Data
x = np.arange(0, 6 * np.pi, 0.2)
y_1 = np.cos(x)
y_2 = np.sin(2 * x)

# Plot y = cos(x) 
plt.figure(figsize=(10, 8))
plt.subplot(2, 1, 1)
plt.plot(x, y_1, label = '$\cos(x)$')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.legend(loc = 'best')

# Plot y = sin(2x)
plt.subplot(2, 1, 2)
plt.plot(x, y_2, label = '$\sin(2x)$')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.legend(loc = 'best')
plt.show()

Also, we can generate subplot, and access with each axis.

In [18]:
# Data
x = np.arange(0, 6 * np.pi, 0.2)
y_1 = np.cos(x)
y_2 = np.sin(2 * x)

# Plot y = cos(x) 
fig, ax = plt.subplots(2, 1, figsize=(10, 8))
ax[0].plot(x, y_1, label = '$\cos(x)$')
ax[0].set_xlabel('$x$')
ax[0].set_ylabel('$y$')
ax[0].legend(loc = 'best')

# Plot y = sin(2x)
ax[1].plot(x, y_2, label = '$\sin(2x)$')
ax[1].set_xlabel('$x$')
ax[1].set_ylabel('$y$')
ax[1].legend(loc = 'best')
plt.show()

In subplots, if the scale is same in some figures, it can enable to share axis (x axis or y axis)

In [20]:
# Data
x = np.arange(0, 6 * np.pi, 0.2)
y_1 = np.cos(x)
y_2 = np.sin(2 * x)
y_3 = y_1 + y_2

fig, axs = plt.subplots(3, 1, sharex = True, figsize=(10, 8))
axs[0].plot(x, y_1)
axs[1].plot(x, y_2)
axs[2].plot(x, y_3)
axs[0].set_ylabel('$y$')
axs[1].set_ylabel('$y$')
axs[2].set_ylabel('$y$')
axs[2].set_xlabel('$x$')
plt.show()

Barplots

In statistics, it is useful to compare the amount of each class with barplot.

In [22]:
# Data
x = np.arange(0, 7, 1) # x in [2, 7) 
y = x
 
plt.figure(figsize=(10, 8))
plt.bar(x, y, label = '$x$')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.legend(loc = 'upper left')
plt.show()

Multiple axis

Sometimes it is useful to have multiple y-axis in the same plot.

For example, if we want to investigate the corrleation between the obesity rate of a country and the amount of calories that people in this country have per meal.

In [37]:
# Make-up data
calories = [380.70, 420.98, 454.91, 406.45, 446.16, 498.08, 504.54, 459.05, 459.55, 484.79]
countries = ['India', 'Japan', 'Korea', 'China', 'Thai', 'Italy', 'France', 'Greece', 'Mexico', 'US']
obesity_rates = [3.9, 4.3, 4.7, 6.2, 10, 19.9, 21.6, 24.9, 28.9, 36.2]

fig, ax1 = plt.subplots(figsize = (10, 8))
ax1.bar(countries, obesity_rates, color='C8')
ax1.set_ylabel('obesity rate(%)', color='C8')
ax1.tick_params(axis='y', labelcolor='C8')

# Enable multiple axis
ax2 = ax1.twinx()
ax2.plot(countries, calories, color='C0')
ax2.set_ylabel('calories', color='C0')
ax2.tick_params(axis='y', labelcolor='C0')
plt.show()

Scatter plot

A scatter plot displays values for typically two variables for a set of data. The data are displayed as a collection of points.

Plot $y = 2x + 3 + \epsilon$, where $\epsilon \sim \mathcal{N}(0, 1)$ (also known as Gaussian Noise). The following code makes a scatter plot for all $(x, y)$ pairs.

Note that, it is widely used in comparing the actual point and linear regression model.

In [25]:
# Data
x = np.arange(0, 10, 0.5) # x in [0, 10) 
noise = np.random.randn(len(x)) # Generate standard normal random variables
y = 2 * x + 3 + noise

plt.figure()
plt.scatter(x, y)
plt.plot(x, 2 * x + 3, color = 'r')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.show()

Contour Plot

A contour plot represents a 3-dimensional surface by plotting constant $z$ slices (contours). Given a value for $z$, lines are drawn for connecting the $(x,y)$ coordinates where that $z$ value occurs.

Here, it will plot the contours of $J(\mathbf{w})$

$$ J(\mathbf{w}) = (\mathbf{w} - \mathbf{w}_{o})^{T}\mathbf{A}(\mathbf{w} - \mathbf{w}_{o}) $$

where $\mathbf{w} = \begin{bmatrix} -2 \\ 2 \end{bmatrix}$, $\mathbf{A} = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}$.

In [26]:
xmin, xmax, xstep = -4, 0, .1
ymin, ymax, ystep = 0, 4, .1

A = np.array([[2, 0], [0, 1]])
w0 = np.array([-2., 2.]).reshape(2, 1)

J = lambda x, y: A[0, 0] * (x - w0[0]) ** 2 + (A[0, 1] + A[1, 0]) * (x - w0[0]) * (y - w0[1]) + A[1, 1] * (y - w0[1]) ** 2
gradient_u = lambda x, y: (A[0, 0] * (x - w0[0]) + A[0, 1] * (y - w0[1])) + (A[0, 0] * (x - w0[0]) + A[1, 0] * (y - w0[1]))
gradient_v = lambda x, y: (A[1, 0] * (x - w0[0]) + A[1, 1] * (y - w0[1])) + (A[0, 1] * (x - w0[0]) + A[1, 1] * (y - w0[1]))

x, y = np.meshgrid(np.arange(xmin, xmax + xstep, xstep),
                   np.arange(ymin, ymax + ystep, ystep))

z = J(x, y)

fig, ax = plt.subplots(figsize=(7,7))
ax.contour(x, y, z, levels=np.logspace(0, 5, 35), cmap='jet')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))
plt.show()

Quiver Plot

A quiver plot displays velocity vectors as arrows with components $(u,v)$ at the points $(x,y)$.

Plot $\nabla J(\mathbf{w}) = 2\mathbf{A}\mathbf{w} - 2\mathbf{A}\mathbf{w}_{o}$.

In [27]:
x1, y1 = np.meshgrid(np.arange(xmin, xmax, 0.2),
                   np.arange(ymin, ymax, 0.2))
u1 = gradient_u(x1, y1)
v1 = gradient_v(x1, y1)

fig, ax = plt.subplots(figsize=(7, 7))
ax.quiver(x1, y1, u1, v1)
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))
plt.show()

We can add contour plot on the quiver plot.

In [28]:
fig, ax = plt.subplots(figsize=(7, 7))
ax.contour(x, y, z, levels=np.logspace(0, 5, 35), cmap='jet')
ax.quiver(x1, y1, u1, v1)
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))
plt.show()

3D Faces

Besides of contour plot, we can directly plot the 3D faces as well using matplotlib.

Plot the 3D-face of 2D joint Gaussian distribution.

$\boldsymbol{\mu} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} $, $\boldsymbol{\Sigma} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $.

Note: $\mu$ is location of center. and $\Sigma$ is covariance matrix

In [29]:
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

Mu = np.array([0, 0])
Cov = np.array([[1, 0], [0, 1]])
rv = multivariate_normal(Mu, Cov)

fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-10, 10, 0.25)
Y = np.arange(-10, 10, 0.25)
X, Y = np.meshgrid(X, Y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
Z = rv.pdf(pos)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet)
plt.show()

We can change the location and covariance matrix.

$\boldsymbol{\mu} = \begin{bmatrix} 0 \\ 4 \end{bmatrix} $, $\boldsymbol{\Sigma} = \begin{bmatrix} 5 & 0 \\ 0 & 1 \end{bmatrix} $.

In [30]:
Mu = np.array([0, 4])
Cov = np.array([[5, 0], [0, 1]])
rv = multivariate_normal(Mu, Cov)

fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-10, 10, 0.25)
Y = np.arange(-10, 10, 0.25)
X, Y = np.meshgrid(X, Y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
Z = rv.pdf(pos)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet)
plt.show()

$\boldsymbol{\mu} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} $, $\boldsymbol{\Sigma} = \begin{bmatrix} 10.5 & -9.5 \\ -9.5 & 10.5 \end{bmatrix} $.

In [31]:
Mu = np.array([0, 0])
Cov = np.array([[10.5, -9.5], [-9.5, 10.5]])
rv = multivariate_normal(Mu, Cov)

fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-10, 10, 0.25)
Y = np.arange(-10, 10, 0.25)
X, Y = np.meshgrid(X, Y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
Z = rv.pdf(pos)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet)
plt.show()