Bokeh Tutorial

10. High Level Charts

This section covers the bokeh.charts interface, which is a high-level API that is especially useful for exploratory data analysis (for instance, in a Jupyter notebook). It provides functions for quickly producing many standard chart types, often with a single line of code. We will look at the following types in this notebook:

Scatter Plot
Bar Chart
Histogram
Box Plot

In [1]:

from bokeh.io import output_notebook, show
output_notebook()

Loading BokehJS ...

Scatter Plot¶

A high-level scatter plot is provided by bokeh.charts.Scatter.

For this section will use the "iris" data set. First let's import it and take a look at a few rows:

In [2]:

from bokeh.sampledata.iris import flowers
flowers.head()

Out[2]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

In [3]:

from bokeh.charts import Scatter

A basic scatter chart takes the data (in this case a pandas DataFrame) as the first argument, and specifies the x and y coordinates for the scatter as the names of columns in the data.

In [4]:

p = Scatter(flowers, x='petal_length', y='petal_width')
show(p)

By passing a column name for the color parameter, you can make Scatter automatically color the markers according to the groups in that column. Let's also add a legend by specify its location as the value of a legend paramter (in this case "top_left")

In [5]:

p = Scatter(flowers, x='petal_length', y='petal_width', color='species', legend='top_left')
show(p)

By passing a column name for the marker parameter, you can make Scatter automatically vary the marker shapes according to the groups in that column. Let's try that as an exercise.

In [6]:

# EXERCISE: vary the marker shape by passing a column name as the `marker` keyword argument

Bar Chart¶

A high-level bar chart is provided by bokeh.charts.Bar

For this section, we will use the "autompg" data set. Let's import it and take a quick look:

In [7]:

from bokeh.sampledata.autompg import autompg
autompg.head()

Out[7]:

	mpg	cyl	displ	hp	weight	accel	yr	origin	name
0	18.0	8	307.0	130	3504	12.0	70	1	chevrolet chevelle malibu
1	15.0	8	350.0	165	3693	11.5	70	1	buick skylark 320
2	18.0	8	318.0	150	3436	11.0	70	1	plymouth satellite
3	16.0	8	304.0	150	3433	12.0	70	1	amc rebel sst
4	17.0	8	302.0	140	3449	10.5	70	1	ford torino

In [8]:

from bokeh.charts import Bar

A basic bar chart takes the data (again a DataFrame) as the first value, as well as column names for:

label - a column to group to label the x-axis
values - a column to aggregate values for each group, to give the bar heights
agg - the name of an aggregation to perform over the values (e.g., "mean", "max", etc.)

A simple example that also specifies some other properties such as title and legend is shown below:

In [9]:

p = Bar(autompg, label='cyl', values='mpg', agg='max', 
        title="Max MPG by CYL", legend=None, tools='crosshair')
show(p)

By passing another column name as the group parameter, the aggregations can be further subdivided by the groups in that column, and the bars grouped visually. The example below demonstrates this, as well as adding a legend by specifying its location:

In [10]:

p = Bar(autompg, label='yr', values='mpg', agg='median', group='origin', 
        title="Median MPG by YR, grouped by ORIGIN", legend='top_left', tools='crosshair')
show(p)

Similarly, bars for subgroups can be stacked visually, by providing a column name for the stack parameter. Let's try that as an exercise.

In [11]:

# EXERCISE: change the chart above to stack the bars with title "Median MPG by YR, stacked by ORIGIN"

Histogram¶

A high-level Histogram is provided by bokeh.charts.Histogram

For this section, we will construct our own synthetic data set that has values generated from two different probability distributions.

In [12]:

import pandas as pd
import numpy as np

# build some distributions
mu, sigma = 0, 0.5
normal = pd.DataFrame({'value': np.random.normal(mu, sigma, 1000), 'type': 'normal'})
lognormal = pd.DataFrame({'value': np.random.lognormal(mu, sigma, 1000), 'type': 'lognormal'})

# create a pandas data frame
df = pd.concat([normal, lognormal])
df[995:1005]

Out[12]:

	type	value
995	normal	-0.301098
996	normal	-0.740360
997	normal	0.030623
998	normal	0.320627
999	normal	0.049325
0	lognormal	0.350363
1	lognormal	0.508560
2	lognormal	2.078477
3	lognormal	1.247154
4	lognormal	0.941148

In [13]:

from bokeh.charts import Histogram

A basic histogram takes the data as the first parameter, and a column name as the values parameter. Optionally, you can also specify the number of bins to use by giving a value for the bins parameter. The example below shows the distribution of *all* the values (both the "normal" and "lognormal" values).

In [14]:

hist = Histogram(df, values='value', bins=30)
show(hist)

It's also possible to generate multiple histograms at once by grouping the data. The column to group by is specified by the color parameter (and the histogram for each group is colored differently automatically). Let's try that as an exercise.

In [15]:

# EXERCISE: generate histograms for each "type" of distribution, and add a legend to the top left.

Box Plot¶

A high-level box plot is provided by bokeh.charts.BoxPlot

For this section we will use the "iris" data set again.

In [16]:

from bokeh.charts import BoxPlot

A basic box plot takes the data as the first value, as well as column names for:

label - a column to group to label the x-axis
values - a column to aggregate values for each group

A simple example that also specifies some other properties such as title and legend is shown below:

In [17]:

p = BoxPlot(flowers, label='species', values='petal_width', tools='crosshair', color='#aa4444',
            xlabel='', ylabel='petal width, mm', title='Distributions of petal widths')
show(p)

Instead of a single color, the box and whiskers groups can be colored by grouping one of the columns. This is done by passing a column name as the color parameter. Let's try that as an exercise.

In [18]:

# EXERCISE: color the boxes by "species" and add a legend to the top left