Hexbin plot

Hexagonal Binning is a method for vizualizing bivariate distributions. It is recommended for identifying patterns in large 2d data sets.

The underlying idea is as follows: a rectangular region including a data set is tesselated with regular hexagons. The number/proportion of points falling in each cell is counted and mapped to a colormap. The resulting chart is called hexbin plot.

Matplotlib provides the function pyplot.hexbin that returns an instance of PolyCollection. We call for such an instance a few methods in order to get data in an appropriate form for a Plotly plot.

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import matplotlib.cm as cm
import cmocean#  http://matplotlib.org/cmocean/

import plotly.graph_objs as go

Our hexagonal tesselation consists in Plotly shapes bounded by regular hexagons. The corresponding color of each cell is the matplotlib facecolor of the corresponding PolyCollection, converted to a Plotly color by a function defined below (pl_cell_color).

Read data from a file:

In [ ]:
points = np.load('hexbin-data.npy')#https://github.com/empet/Datasets/blob/master/hexbin-data.npy
x, y = points.T

Call the matplotlib hexbin function for our data set. Since we need only to create an instance of the PolyCollection class and not to show its plot, we set a very small figure size.

In order to get initialized all attributes of this instance it is important to have %matplotlib inline, because some attributes are set at the plot time.

In [ ]:
plt.figure(figsize=(0.05,0.05))
plt.axis('off')
HB = plt.hexbin(x, y, gridsize=25, cmap=cmocean.cm.algae , mincnt=1) # cmocean.cm.algae is a cmocean colormap

gridsize is the number of hexagons in the x direction. By default it is 100.

mincnt gives the minimum number of points in each hexagon. More precisely, any cell containing at least mincnt data points will be plotted. The default value is 0. Hence to avoid plotting hexagons with no points, we set it to 1.

We define below the function get_hexbin_attributes, that returns the attributes of a hexbin type PolyCollection object, namely:

  • a numpy.array of shape (7, 2) that contains the coordinates of the vertices $V_0, V_1, V_2, V_3, V_4, V_5, V_0$, of a prototypical hexagon of the tesselation. It is a hexagon symmetric with respect to the origin, $O(0,0)$, with two vertices on $Oy$, and scaled such that gridsize hexagons fill a row of the tesselation. This hexagon is then translated to the corresponding positions in the rectangular region of data, in order to get a hexagonal lattice.
  • the offsets of the translation transformations, as a numpy.array of shape (no_hexagons, 2);
  • the matplotlib color codes (facecolors) of each hexagon;
  • the list of hexagonal bin counts.

The offsets, facecolors and the list of counts have the same length, equal to the number of hexagons containing at least mincnt points.

In [ ]:
def get_hexbin_attributes(hexbin):
    paths = hexbin.get_paths()
    points_codes = list(paths[0].iter_segments())#path[0].iter_segments() is a generator 
    prototypical_hexagon = [item[0] for item in points_codes]
    return prototypical_hexagon, hexbin.get_offsets(), hexbin.get_facecolors(), hexbin.get_array()

The following function converts matplotlib facecolors to Plotly color codes:

In [ ]:
def pl_cell_color(mpl_facecolors):
     
    return [ f'rgb({int(R*255)}, {int(G*255)}, {int(B*255)})' for (R, G, B, A) in mpl_facecolors]

Define a function that associates to the prototypical hexagon and an offset, a closed hexagonal path, filled with the corresponding Plotly facecolor. Moreover, it computes the hexagon center :

In [ ]:
def make_hexagon(prototypical_hex, offset, fillcolor, linecolor=None):
   
    new_hex_vertices = [vertex + offset for vertex in prototypical_hex]
    vertices = np.asarray(new_hex_vertices[:-1])
    # hexagon center
    center=np.mean(vertices, axis=0)
    if linecolor is None:
        linecolor = fillcolor
    #define the SVG-type path:    
    path = 'M '
    for vert in new_hex_vertices:
        path +=  f'{vert[0]}, {vert[1]} L' 
    return  dict(type='path',
                 line=dict(color=linecolor, 
                           width=0.5),
                 path=  path[:-2],
                 fillcolor=fillcolor, 
                ), center 

Now we can transform the hexbin, HB, to a Plotly 2D hexagonal histogram:

In [ ]:
hexagon_vertices, offsets, mpl_facecolors, counts = get_hexbin_attributes(HB)

The prototypical hexagon has the vertices:

In [ ]:
hexagon_vertices[:-1]# the last vertex coincides with the first one
In [ ]:
cell_color = pl_cell_color(mpl_facecolors)
In [ ]:
shapes = []
centers = []
for k in range(len(offsets)):
    shape, center = make_hexagon(hexagon_vertices, offsets[k], cell_color[k])
    shapes.append(shape)
    centers.append(center)

In order to associate a colorbar to the hexbin plot, we define a dummy Scatter trace representing the hexagon centers. The color attribute is the list of counts, and the colorscale is the Plotly colorscale corresponding to the matplotlib colormap passed in the call of plt.hexbin() above.

A matplotlib colormap is converted into a Plotly colorscale with N entries by the following function:

In [ ]:
def mpl_to_plotly(cmap, N):
    h = 1.0/(N-1)
    pl_colorscale = []
    for k in range(N):
        C = list(map(np.uint8, np.array(cmap(k*h)[:3])*255))
        pl_colorscale.append([round(k*h,2), f'rgb({C[0]}, {C[1]}, {C[2]})'])
    return pl_colorscale
   
In [ ]:
pl_algae = mpl_to_plotly(cmocean.cm.algae, 11)
pl_algae

Get data for the Plotly Scatter trace of hexagon centers:

In [ ]:
X, Y = zip(*centers)

#define  text to be  displayed on hovering the mouse over the cells
text = [f'x: {round(X[k],2)}<br>y: {round(Y[k],2)}<br>counts: {int(counts[k])}' for k in range(len(X))]
In [ ]:
trace = go.Scatter(
             x=list(X), 
             y=list(Y), 
             mode='markers',
             marker=dict(size=0.5, 
                         color=counts, 
                         colorscale=pl_algae, 
                         showscale=True,
                         colorbar=dict(
                                     thickness=20,  
                                     ticklen=4
                                     )),             
           text=text, 
           hoverinfo='text'
          )             
In [ ]:
axis = dict(showgrid=False,
           showline=False,
           zeroline=False,
           ticklen=4 
           )

layout = go.Layout(title='Hexbin plot',
                   width=530, height=550,
                   xaxis=axis,
                   yaxis=axis,
                   hovermode='closest',
                   shapes=shapes,
                   plot_bgcolor='black')
In [ ]:
fig = go.FigureWidget(data=[trace], layout=layout)
fig
In [ ]:
from IPython.core.display import HTML
def  css_styling():
    styles = open("./custom.css", "r").read()
    return HTML(styles)
css_styling()
In [ ]: