I have been tweeting these pass sonars recently and had enough people ask me about code/methodology and hence this notebook.
Pass sonars are these neat visualizations which I first saw from Eliot McKinley. I'm not sure if he's the one who first came up with this and he doesn't take credit but for me he is, so well, I call them Eliot's sonars. Anyway, the sonars themselves are just bar plots on a polar axes. As with bar charts, the height of the bars shows some value $a$. While colors are usually meaningless in bar charts on the conventional cartesian axes, they come together to look really neat on a polar axes (along with representing some value, say, $b$).
I like $a$ to be the frequency and $b$ to be the average length of the passes in certain direction (again, this is the ASA way, not my original idea). So north is the opponent's goal, Ssouth is your own goal and the directions and counts in the sonars should be interpreted that way.
import json
import pandas as pd
from pandas import json_normalize
import numpy as np
import requests
from tqdm import tqdm
from sklearn.preprocessing import minmax_scale
from matplotlib import cm
import matplotlib.pyplot as plt
def get_match_ids(competition_id=72, season_id=30):
"""Get the match ids for a certain season or certain tournament. Default competition and season values are for the FIFA WWC 2019.
Returns:
list of all match_ids
"""
match_ids_dict = requests.get(f"https://raw.githubusercontent.com/statsbomb/open-data/master/data/matches/{competition_id}/{season_id}.json").json()
return [match["match_id"] for match in match_ids_dict]
get_match_ids
will give us our list of match_ids. The next bit is to loop over the match_ids and pull all the data into a single dataframe
. Why do we need to do that? So that we can loop over all the teams and then plot their sonars later on.
*Note*:
def get_all_passes(match_ids_list):
"""Gives all passes of a list of matches concatenated in a dataframe
Args:
match_ids_list: list of match_ids
Returns:
dataframe of all passes
"""
dflist = []
for match_id in tqdm(match_ids_list, desc = "Loading matches"):
data = requests.get(f"https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/{match_id}.json").json()
dflist.append(json_normalize(data, sep="_"))
_df = pd.concat(dflist, ignore_index=True)
return _df.loc[(_df.type_name == 'Pass')].dropna(how="all", axis=1)
get_all_passes
gets us all passes. Now we just need to define a function to plot polar bar plots and then loop over the teams and plot them.
def pass_sonar(team_name, ax):
"""Plots the sonar for a team on a matplotlib axes
"""
##some data wrangling
pdf = df.loc[df.team_name == team_name] ##filter only passes for the team in question
df1 = pdf.loc[:, ["pass_length", "pass_angle"]] ##just take in lengths and angles of passes
bins = np.linspace(-np.pi, np.pi, 24) ##change 24 to something else if you want more/less bins
df1['bins'] = pd.cut(df1['pass_angle'], bins, include_lowest=True, right = True)
df1["bin_mids"] = df1["bins"].apply(lambda x: x.mid)
A = df1.groupby(["bin_mids"]).agg(frequency=("pass_length", "size"), alength = ("pass_length", "mean")).reset_index()
A["freq"] = minmax_scale(A.frequency) ##scale to [0,1]
##plot sonar
bars = ax.bar(A["bin_mids"], A["freq"], width=0.27, bottom=0.01)
cm_ = cmap(norm(np.array(A["alength"]))) ##get our color values from the lengths for the bars
##set color
[bar.set(facecolor=r, edgecolor="white", linewidth=.5) for r, bar in zip(cm_, bars)]
##Some axes cosmetics
ax.set(rlim = [0,1], theta_zero_location = "N") ##orient upwards
ax.title.set(text=team_name[:-8], color="k", fontweight="bold", size=7)
ax.set_axis_off()
Everything's set. All we gotta do is run everything. Then we get our figure
and axes
instances. There were a total of 24 teams and hence, we'll create 24 axes (4x6). Lastly we just need to loop over the axes and team_names and plot them sequentially.
##Call our functions
match_ids = get_match_ids()
df = get_all_passes(match_ids)
Loading matches: 100%|█████████████████████████████████████████████████████████████████| 52/52 [03:11<00:00, 3.69s/it]
fig, axes = plt.subplots(nrows=4, ncols=6, figsize=(13,8),subplot_kw={"projection":"polar"})
axes = axes.flatten()
norm = plt.Normalize() ##default of (0,1) is perfect for us
cmap = cm.get_cmap("viridis")
teams = sorted(df.team_name.unique())
for team_name, ax in zip(teams, axes):
pass_sonar(team_name, ax)
Looks acceptable so far. With knock-out tournaments there's obviously the issue of imbalanced data. Teams which keep qualifing play more matches and hence more passes and we only need to look at the sonars above to see that there's some direct relation between number of passes and sideways passing (and more fanned-out sonars). Also, I've made a pretty stupid mistake by not filtering out non-open-play passes. But, I'll leave it up to the reader(coder?) to fix.