At first, we need to load libraries to be used here. We use the following libraries.
charty
for data visualizationdatasets-pandas
for loading open datasets provided by red-datasets
and using it with Pandas's data framenumo/narray
for some numerical array operationsYou can execute the following code cell by selecting the cell and then hit Shift+Enter.
require "charty"
require "datasets-pandas" # This loads "datasets" and "pandas"
require "numo/narray"
false
{
charty: Charty::VERSION,
datasets_pandas: DatasetsPandas::VERSION,
numo_narray: Numo::NArray::VERSION
}
{:charty=>"0.2.10", :datasets_pandas=>"0.0.1", :numo_narray=>"0.9.2.0"}
In this notebook, we use plotly backend to create plots.
Charty::Backends.use(:plotly)
:plotly
Datasets::Penguins
is a Ruby port of palmerpenguins dataset. This dataset includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), sex, and year.
We will use this dataset in this notebook.
penguins = Datasets::Penguins.new.to_pandas
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year | |
---|---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | male | 2007 |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | female | 2007 |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | female | 2007 |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | None | 2007 |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | female | 2007 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
339 | Gentoo | Biscoe | NaN | NaN | NaN | NaN | None | 2009 |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | female | 2009 |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | male | 2009 |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | female | 2009 |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | male | 2009 |
344 rows × 8 columns
And, we will use the fmri
dataset provided in seaborn for the line plot examples. red-datasets also provides this dataset.
fmri = Datasets::SeabornData.new("fmri").to_pandas
subject | timepoint | event | region | signal | |
---|---|---|---|---|---|
0 | s13 | 18 | stim | parietal | -0.017552 |
1 | s5 | 14 | stim | parietal | -0.080883 |
2 | s12 | 18 | stim | parietal | -0.081033 |
3 | s11 | 18 | stim | parietal | -0.046134 |
4 | s10 | 18 | stim | parietal | -0.037970 |
... | ... | ... | ... | ... | ... |
1059 | s0 | 8 | cue | frontal | 0.018165 |
1060 | s13 | 7 | cue | frontal | -0.029130 |
1061 | s12 | 7 | cue | frontal | -0.004939 |
1062 | s11 | 7 | cue | frontal | -0.025367 |
1063 | s0 | 0 | cue | parietal | -0.006899 |
1064 rows × 5 columns
Simple scatter plot to show the relationship between bill_length_mm
and bill_depth_mm
.
Charty.scatter_plot(
data: penguins, # input table data
x: :bill_length_mm, # the column name for x-axis
y: :bill_depth_mm # the column name for y-axis
)
Specifying species
column as the color-axis to compare the distribution between each value of species
column.
Charty.scatter_plot(
data: penguins, # input table data
x: :bill_length_mm, # the column name for x-axis
y: :bill_depth_mm, # the column name for y-axis
color: :species # the column name for color-axis
)
Charty.scatter_plot(
data: penguins,
x: :bill_length_mm,
y: :bill_depth_mm,
color: :sex
)
scatter_plot
also supports size
and style
axes. Specifying size
axis changes the plot marker size, and specifying style
axis changes the plot marker shape.
Let's try to use them.
Charty.scatter_plot(
data: penguins,
x: :bill_length_mm,
y: :bill_depth_mm,
color: :species,
size: :body_mass_g,
style: :sex
)
Charty makes the look and feel of plots as similar as possible in the different visualization backends.
Let's check the previous plot in the pyplot backend.
Charty::Backends.use(:pyplot) # Select pyplot backend
# The following method is necessary to be called to show plots inline
Charty::Backends::Pyplot.activate_iruby_integration
[:inline, "module://matplotlib_rb.backend_inline"]
The same plot method call generates a similar plot on the different backend.
Charty.scatter_plot(
data: penguins,
x: :bill_length_mm,
y: :bill_depth_mm,
color: :species,
size: :body_mass_g,
style: :sex
)
#<Charty::Plotters::ScatterPlotter:0x000000000000c3dc>
Pyplot backend puts the legend inside of the plot area, and the legend format is different from plotly backend.
This is because pyplot backend emulate the output of Python's seaborn library.
Let's check the seaborn's output.
sns = PyCall.import_module("seaborn") # Import seaborn by pycall
sns.scatterplot(
data: penguins,
x: :bill_length_mm,
y: :bill_depth_mm,
hue: :species,
size: :body_mass_g,
style: :sex
)
<AxesSubplot:xlabel='bill_length_mm', ylabel='bill_depth_mm'>
As you can see the previous code, Charty's plotting methods are heavily influenced by seaborn's API.
We can easily restore to plotly backend.
Charty::Backends.use(:plotly) # Back to plotly backend
:plotly
The next is line plot. Line plot is suitable for displaying the change of one variable in order to the continuous change of the other variable.
For example, the following code shows a sine curve.
steps = Numo::DFloat.linspace(0, 100)
xs = 2 * Math::PI * steps / 100
ys = Numo::NMath.sin(xs)
Charty.line_plot(x: xs, y: ys, x_label: "x", y_label: "sin(x)")
And the next example shows the 1-dimensional random walk.
walk1d = Numo::DFloat.new(100).rand_norm.inplace.cumsum
walk1d.inplace - walk1d[0] # Set start point to zero
Charty.line_plot(data: walk1d)
Specifying false
to sort
keyword argument, a trajectry can be illustrated as a line plot.
walk2d = Numo::DFloat.new(2000, 2).rand_norm().inplace.cumsum(axis: 0)
walk2d[true, 0].inplace - walk2d[0, 0]
walk2d[true, 1].inplace - walk2d[0, 1]
Charty.line_plot(x: walk2d[true, 0], y: walk2d[true, 1], sort: false)
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal
)
Charty.scatter_plot(
data: fmri,
x: :timepoint,
y: :signal
)
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
error_bar: nil # Disable to show confidence interval
)
color-axis is also supported in line_plot
.
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event
)
Specifying err_style: :bars
changes the confidence interval style from area to bars.
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event,
err_style: :bars
)
The default error region is the 95% confidence interval. It is calculated by bootstraping.
You can change the erro region to the 60% confidence interval like below.
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event,
error_bar: [:ci, 60]
)
:ci
ではなく :pi
を指定するとパーセンタイル区間が表示される。次の例は90パーセンタイル区間を表示する。
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event,
error_bar: [:pi, 90]
)
style axis for dash pattern.
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event,
style: :region,
error_bar: nil
)
size-axis is for line width.
Charty.line_plot(
data: fmri,
x: :timepoint,
y: :signal,
color: :event,
size: :region,
error_bar: nil
)
Two methods:
scatter_plot
line_plot
5 Axes:
x
for the horizontal axisy
for the vertical axiscolor
for color groupingsize
for line width or marker sizestyle
for dash pattern or marker shapeIn bar_plot
, either x
or y
must be a categorical variable, and the other one must be a numeric variable.
bar = Charty.bar_plot(
data: penguins,
x: :species,
y: :flipper_length_mm
)
# Swapping x and y changes the orientation of the plot
Charty.bar_plot(
data: penguins,
x: :flipper_length_mm,
y: :species
)
Charty.bar_plot(
data: penguins,
x: :flipper_length_mm,
y: :species,
color: :sex
)
count_plot
is suitable to plot the frequency for each category.
Charty.count_plot(
data: penguins,
x: :species
)
Charty.box_plot(
data: penguins,
x: :species,
y: :flipper_length_mm
)
Charty.box_plot(
data: penguins,
x: :flipper_length_mm,
y: :species
)
Charty.box_plot(
data: penguins,
x: :flipper_length_mm,
y: :species,
color: :sex
)
Three methods:
bar_plot
count_plot
box_plot
Three Axes:
x
for the horizontal axisy
for the vertical axiscolor
for color groupingCharty.hist_plot(
data: penguins,
x: :flipper_length_mm
)
# The following usage also works.
#
# Charty.hist_plot(
# data: penguins[:flipper_length_mm]
# )
The default count of bins are calculated by Sturges' formula below:
$$ N = \lceil 1 + \log_2 n \rceil $$where $n$ is the length of the input array, $N$ is the number of bins.
Charty.hist_plot(
data: penguins,
x: :flipper_length_mm,
bins: 20
)
Charty.hist_plot(
data: penguins,
x: :flipper_length_mm,
color: :species
)
One method:
hist_plot
Two axes:
x
for the horizontal axiscolor
for color grouping