Here you will see how to draw interactive graphics easily and with a minimum of code. Plotly is an extremely powerful tool and it is impossible to cover all its features at once, so I'll show you how to build the most relevant and interesting graphics.

First, if you don't already have Plotly installed, run:

In [ ]:
#!pip install plotly
In [ ]:
import warnings

import numpy as np
import pandas as pd
import plotly.offline as py
import pycountry
import seaborn as sns
from matplotlib import pyplot as plt


import plotly.graph_objs as go
In [ ]:
PATH = "120-years-of-olympic-history-athletes-and-results/athlete_events.csv"
data = pd.read_csv(PATH)

Let's download the file athlete_events.csv from Kaggle page. The dataset has the following features:

  • ID - Unique number for each athlete
  • Name - Athlete's name
  • Sex - M or F
  • Age - Integer
  • Height - In centimeters
  • Weight - In kilograms
  • Team - Team name
  • NOC - National Olympic Committee 3-letter code
  • Games - Year and season
  • Year - Integer
  • Season - Summer or Winter
  • City - Host city
  • Sport - Sport
  • Event - Event
  • Medal - Gold, Silver, Bronze, or NA

Let's draw the simplest graph possible. Display the percentages of three types of medals among the total number of medals.

In [ ]:
colors = ["#f4cb42", "#cd7f32", "#a1a8b5"]  # gold,bronze,silver
medal_counts = data.Medal.value_counts(sort=True)
labels = medal_counts.index
values = medal_counts.values

pie = go.Pie(labels=labels, values=values, marker=dict(colors=colors))
layout = go.Layout(title="Medal distribution")
fig = go.Figure(data=[pie], layout=layout)

All the main classes for drawing graphs are located in plotly.graph_objs as go:

  • go.Pie is a graph object with any of the named arguments or attributes listed below.
  • go.Layout allows you to customize axis labels, titles, fonts, sizes, margins, colors, and more to define the appearance of the chart.
  • go.Figure just creates the final object to be plotted, and simply just creates a dictionary-like object that contains both the data object and the layout object.

Okay, that was too easy. Let's complicate the chart a bit, we will use two Pie on one chart.

We will display the top 10 countries whose athletes win any medals. Separate for men and women.

In [ ]:
topn = 10
male = data[data.Sex == "M"]
female = data[data.Sex == "F"]
count_male = male.dropna().NOC.value_counts()[:topn].reset_index()
count_female = female.dropna().NOC.value_counts()[:topn].reset_index()

pie_men = go.Pie(
    domain={"x": [0, 0.46]},
pie_women = go.Pie(
    domain={"x": [0.5, 1]},

layout = dict(
    title="Top-10 countries with medals by sex",
        dict(x=0.2, y=0.5, text="Men", showarrow=False, font=dict(size=20)),
        dict(x=0.8, y=0.5, text="Women", showarrow=False, font=dict(size=20)),

fig = dict(data=[pie_men, pie_women], layout=layout)
  • Parameter hole sets the size of the hole in the center of the pie
  • Parameter domain sets the offset. The X array set the horizontal position whilst the Y array sets the vertical. For example, x: [0,0.5], y: [0, 0.5] would mean the bottom left position of the plot.
  • Dict annotations sets the format of the text inside the Pie.
  • To learn more, read the go.Pie documentation

Of course, we can not do without the Bar.

Let's draw a bar chart of the number of sports in different years.

In [ ]:
games = data[data.Season == "Summer"].Games.unique()
sport_counts = np.array(
    [data[data.Games == game].groupby("Sport").size().shape[0] for game in games]
bar = go.Bar(
    marker=dict(color=sport_counts, colorscale="Reds", showscale=True),
layout = go.Layout(title="Number of sports in the summer Olympics by year")
fig = go.Figure(data=[bar], layout=layout)

The whole rendering scheme is the same, now the base class is go.Bar.

  • Dictionary marker sets the drawing style of the chart and allows you to display the color scale
  • To learn more, read the go.Bar documentation

Again, let's complicate the graph and display the number of different medals for the top 10 countries

In [ ]:
topn = 10
top10 = data.dropna().NOC.value_counts()[:topn]

gold = data[data.Medal == "Gold"].NOC.value_counts()
gold = gold[top10.index]
silver = data[data.Medal == "Silver"].NOC.value_counts()
silver = silver[top10.index]
bronze = data[data.Medal == "Bronze"].NOC.value_counts()
bronze = bronze[top10.index]

bar_gold = go.Bar(x=gold.index, y=gold, name="Gold", marker=dict(color="#f4cb42"))
bar_silver = go.Bar(
    x=silver.index, y=silver, name="Silver", marker=dict(color="#a1a8b5")
bar_bronze = go.Bar(
    x=bronze.index, y=bronze, name="Bronze", marker=dict(color="#cd7f32")

layout = go.Layout(
    title="Top-10 countries with medals", yaxis=dict(title="Count of medals")

fig = go.Figure(data=[bar_gold, bar_silver, bar_bronze], layout=layout)

Let's draw a beautiful scatter plot showing average height and weight for athletes from different sports.

We will make circles of different sizes depending on the popularity of the sport and, as a result, the sample size of athletes.

In [ ]:
tmp = data.groupby(["Sport"])["Height", "Weight"].agg("mean").dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp = data.groupby(["Sport"])["ID"].count()
df2 = pd.DataFrame(tmp).reset_index()
dataset = df1.merge(df2)  # DataFrame with columns 'Sport', 'Height', 'Weight', 'ID'

scatterplots = list()
for sport in dataset["Sport"]:
    df = dataset[dataset["Sport"] == sport]
    trace = go.Scatter(
        marker=dict(symbol="circle", sizemode="area", sizeref=10, size=df["ID"]),

layout = go.Layout(
    title="Mean height and weight by sport",
    xaxis=dict(title="Height, cm"),
    yaxis=dict(title="Weight, kg"),

fig = dict(data=scatterplots, layout=layout)

It was beautiful, wasn't it? We can interactively remove the sport we are interested in, zoom in and analyze the charts in every possible way.

  • Dictionary marker again defines the drawing view, sets the shape type (try, for example, square), dimensions, and more. The possibilities are almost endless.
  • To learn more, read the go.Scatter documentation

We will display statistics on age for men and women participating in the Olympics with Boxplot.

In [ ]:
men = data[data.Sex == "M"].Age
women = data[data.Sex == "F"].Age

box_m = go.Box(x=men, name="Male", fillcolor="navy")
box_w = go.Box(x=women, name="Female", fillcolor="lime")
layout = go.Layout(title="Age by sex")
fig = go.Figure(data=[box_m, box_w], layout=layout)
  • This graph describes the distribution of the data. The center vertical line corresponds to the median, and the boundaries of the rectangle correspond to the first and third quartiles. The points show the outliers. In addition, you can see the minimum and maximum values.
  • Who do you think is the youngest (10 y.o.) and the oldest (97 y.o.) participant of the Olympics? Find them :)
  • To learn more, read the go.Box documentation

Let's determine how many participants were sent by different countries during the whole period of the Olympics.

In [ ]:
#!pip install pycountry
In [ ]:
def get_name(code):
    Translate code to name of the country
        name = pycountry.countries.get(alpha_3=code).name
        name = code
    return name

country_number = pd.DataFrame(data.NOC.value_counts())
country_number["country"] = country_number.index
country_number.columns = ["number", "country"]
country_number.reset_index().drop(columns=["index"], inplace=True)
country_number["country"] = country_number["country"].apply(lambda c: get_name(c))
In [ ]:
worldmap = [
        locationmode="country names",
        marker=dict(line=dict(color="rgb(180,180,180)", width=0.5)),
        colorbar=dict(autotick=False, title="Number of athletes"),

layout = dict(
    title="The Nationality of Athletes",
    geo=dict(showframe=False, showcoastlines=True, projection=dict(type="Mercator")),

fig = dict(data=worldmap, layout=layout)
py.iplot(fig, validate=False)

That's it, you've learned how plotly works and mastered simple but beautiful interactive graphics.