Notebook first written: 29/10/2021
Notebook last updated: 05/12/2021
Click here to jump straight to the Exploratory Data Analysis section and skip the Task Brief, Data Sources, Data Engineering, Data Aggregation, and Subsetted DataFrames sections.
This notebook parses pubicly available StatsBomb Event data, using pandas for data manipulation through DataFrames.
For more information about this notebook and the author, I'm available through all the following channels:
The accompanying GitHub repository for this notebook can be found here and a static version of this notebook can be found here.
This notebook was written using Python 3 and requires the following libraries:
Jupyter notebooks
for this notebook environment with which this project is presented;NumPy
for multidimensional array computing; andpandas
for data analysis and manipulation.All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the Conda distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv
# Import Dependencies
%matplotlib inline
# Math Operations
import numpy as np
from math import pi
# Datetime
import datetime
from datetime import date
import time
# Data Preprocessing
import pandas as pd
import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path
# Reading Directories
import glob
import os
# Working with JSON
import json
from pandas.io.json import json_normalize
# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
# Progress Bar
from tqdm import tqdm
# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML
# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")
print("Setup Complete")
Setup Complete
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
Python: 3.7.6 NumPy: 1.20.3 pandas: 1.3.2 matplotlib: 3.4.2
# Define today's date
today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_sb = os.path.join(base_dir, 'data', 'sb')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
# make the directory structure
for folder in ['combined', 'competitions', 'events', 'matches']:
path = os.path.join(data_dir_sb, 'raw', folder)
if not os.path.exists(path):
os.mkdir(path)
# Define custom functions for used in the notebook
## Function to read JSON files that also handles the encoding of special characters e.g. accents in names of players and teams
def read_json_file(filename):
with open(filename, 'rb') as json_file:
return BytesIO(json_file.read()).getvalue().decode('unicode_escape')
## Function to flatten pandas DataFrames with nested JSON columns. Source: https://stackoverflow.com/questions/39899005/how-to-flatten-a-pandas-dataframe-with-some-columns-as-json
def flatten_nested_json_df(df):
df = df.reset_index()
print(f"original shape: {df.shape}")
print(f"original columns: {df.columns}")
# search for columns to explode/flatten
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
print(f"lists: {list_columns}, dicts: {dict_columns}")
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
print(f"flattening: {col}")
# explode dictionaries horizontally, adding new columns
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
print(f"exploding: {col}")
# explode lists vertically, adding new columns
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
# check if there are still dict o list fields to flatten
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
print(f"lists: {list_columns}, dicts: {dict_columns}")
print(f"final shape: {df.shape}")
print(f"final columns: {df.columns}")
return df
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment=None
This Jupyter notebook is part of a series of notebooks to parse and engineer StatsBomb Event data.
This particular notebook is the StatsBomb Data Parsing notebook for 360 data, that takes raw JSON data downloaded from the StatsBomb Open Data GitHub Repository and converts this to event level data that is saved as a CSV file.
Links to these notebooks in the football_analytics
GitHub repository can be found at the following:
Notebook Conventions:
DataFrame
object are prefixed with df_
.DataFrame
objects (e.g., a list, a set or a dict) are prefixed with dfs_
.StatsBomb are a football analytics and data company.
Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section Section 3 and Cleaned in the Data Engineering section Section 4.
We'll be using the pandas library to import our data to this workbook as a DataFrame.
The complete data set contains:
The datasets we will be using are:
The data needs to be imported as a DataFrame in the Data Sources section Section 3 and cleaned in the Data Engineering section Section 4.
# ADD CODE HERE
The following cells read in the JSON
files into a pandas DataFrame
object with some basic Data Engineering to flatten the data and select only the columns of interest ensuring that the Jupyter otebook does not crash on a standard laptop.
# ADD MARKDOWN TABLE OF DATA HERE
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'competitions/*')))
['../../data/sb/raw/competitions/competitions_wc2018.csv', '../../data/sb/raw/competitions/competitions.csv', '../../data/sb/raw/competitions/competitions_male.csv']
# Read in exported CSV file if exists, if not, read in JSON file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions.csv')):
json_competitions = read_json_file(os.path.join(data_dir_sb, 'open-data', 'data', 'competitions.json'))
df_competitions_flat = pd.read_json(json_competitions)
##
else:
df_competitions_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions.csv'))
# Display DataFrame
df_competitions_flat
competition_id | season_id | country_name | competition_name | competition_gender | competition_youth | competition_international | season_name | match_updated | match_updated_360 | match_available_360 | match_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16 | 4 | Europe | Champions League | male | False | False | 2018/2019 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2021-07-09T14:06:05.802 |
1 | 16 | 1 | Europe | Champions League | male | False | False | 2017/2018 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2021-01-23T21:55:30.425330 |
2 | 16 | 2 | Europe | Champions League | male | False | False | 2016/2017 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
3 | 16 | 27 | Europe | Champions League | male | False | False | 2015/2016 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
4 | 16 | 26 | Europe | Champions League | male | False | False | 2014/2015 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
5 | 16 | 25 | Europe | Champions League | male | False | False | 2013/2014 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
6 | 16 | 24 | Europe | Champions League | male | False | False | 2012/2013 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2021-07-10T13:41:45.751 |
7 | 16 | 23 | Europe | Champions League | male | False | False | 2011/2012 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
8 | 16 | 22 | Europe | Champions League | male | False | False | 2010/2011 | 2021-06-22T21:17:46.381 | 2021-06-13T16:17:31.694 | None | 2021-06-22T21:17:46.381 |
9 | 16 | 21 | Europe | Champions League | male | False | False | 2009/2010 | 2021-06-22T21:24:20.506 | 2021-06-13T16:17:31.694 | None | 2021-06-22T21:24:20.506 |
10 | 16 | 41 | Europe | Champions League | male | False | False | 2008/2009 | 2021-11-07T14:20:01.699993 | 2021-06-13T16:17:31.694 | None | 2021-11-07T14:20:01.699993 |
11 | 16 | 39 | Europe | Champions League | male | False | False | 2006/2007 | 2021-03-31T04:18:30.437060 | 2021-06-13T16:17:31.694 | None | 2021-03-31T04:18:30.437060 |
12 | 16 | 37 | Europe | Champions League | male | False | False | 2004/2005 | 2021-04-01T06:18:57.459032 | 2021-06-13T16:17:31.694 | None | 2021-04-01T06:18:57.459032 |
13 | 16 | 44 | Europe | Champions League | male | False | False | 2003/2004 | 2021-04-01T00:34:59.472485 | 2021-06-13T16:17:31.694 | None | 2021-04-01T00:34:59.472485 |
14 | 16 | 76 | Europe | Champions League | male | False | False | 1999/2000 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
15 | 37 | 90 | England | FA Women's Super League | female | False | False | 2020/2021 | 2021-07-01T18:14:40.756 | 2021-06-13T16:17:31.694 | None | 2021-07-01T18:14:40.756 |
16 | 37 | 42 | England | FA Women's Super League | female | False | False | 2019/2020 | 2021-06-01T13:01:18.188 | 2021-06-13T16:17:31.694 | None | 2021-06-01T13:01:18.188 |
17 | 37 | 4 | England | FA Women's Super League | female | False | False | 2018/2019 | 2021-12-02T12:09:35.585046 | 2021-06-13T16:17:31.694 | None | 2021-12-02T12:09:35.585046 |
18 | 43 | 3 | International | FIFA World Cup | male | False | True | 2018 | 2021-08-05T16:04:30.081 | 2021-06-13T16:17:31.694 | None | 2021-08-05T16:04:30.081 |
19 | 11 | 90 | Spain | La Liga | male | False | False | 2020/2021 | 2021-11-28T09:47:02.505122 | 2021-10-30T04:19:36.116600 | 2021-09-17T15:18:33.787790 | 2021-11-28T09:47:02.505122 |
20 | 11 | 42 | Spain | La Liga | male | False | False | 2019/2020 | 2021-06-15T15:35:02.673 | 2021-06-13T16:17:31.694 | None | 2021-06-15T15:35:02.673 |
21 | 11 | 4 | Spain | La Liga | male | False | False | 2018/2019 | 2021-11-02T17:53:14.529952 | 2021-07-09T14:53:22.103024 | None | 2021-11-02T17:53:14.529952 |
22 | 11 | 1 | Spain | La Liga | male | False | False | 2017/2018 | 2021-08-27T11:26:39.802832 | 2021-06-13T16:17:31.694 | None | 2021-05-19T08:38:06.507959 |
23 | 11 | 2 | Spain | La Liga | male | False | False | 2016/2017 | 2021-08-07T22:30:18.242 | 2021-06-13T16:17:31.694 | None | 2021-08-07T22:30:18.242 |
24 | 11 | 27 | Spain | La Liga | male | False | False | 2015/2016 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
25 | 11 | 26 | Spain | La Liga | male | False | False | 2014/2015 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
26 | 11 | 25 | Spain | La Liga | male | False | False | 2013/2014 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
27 | 11 | 24 | Spain | La Liga | male | False | False | 2012/2013 | 2021-10-27T15:44:43.940862 | 2021-06-13T16:17:31.694 | None | 2021-10-27T15:44:43.940862 |
28 | 11 | 23 | Spain | La Liga | male | False | False | 2011/2012 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
29 | 11 | 22 | Spain | La Liga | male | False | False | 2010/2011 | 2021-11-11T22:57:42.361902 | 2021-06-13T16:17:31.694 | None | 2021-11-11T22:57:42.361902 |
30 | 11 | 21 | Spain | La Liga | male | False | False | 2009/2010 | 2021-10-26T13:56:40.989214 | 2021-06-13T16:17:31.694 | None | 2021-10-26T13:56:40.989214 |
31 | 11 | 41 | Spain | La Liga | male | False | False | 2008/2009 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
32 | 11 | 40 | Spain | La Liga | male | False | False | 2007/2008 | 2021-10-26T13:13:56.180589 | 2021-06-13T16:17:31.694 | None | 2021-10-26T13:13:56.180589 |
33 | 11 | 39 | Spain | La Liga | male | False | False | 2006/2007 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
34 | 11 | 38 | Spain | La Liga | male | False | False | 2005/2006 | 2021-11-28T23:00:27.747396 | 2021-06-13T16:17:31.694 | None | 2021-11-28T23:00:27.747396 |
35 | 11 | 37 | Spain | La Liga | male | False | False | 2004/2005 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
36 | 49 | 3 | United States of America | NWSL | female | False | False | 2018 | 2021-11-06T05:53:29.435016 | 2021-06-13T16:17:31.694 | None | 2021-11-06T05:53:29.435016 |
37 | 2 | 44 | England | Premier League | male | False | False | 2003/2004 | 2021-11-14T22:29:00.646120 | 2021-06-13T16:17:31.694 | None | 2021-11-14T22:29:00.646120 |
38 | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
39 | 72 | 30 | International | Women's World Cup | female | False | True | 2019 | 2020-07-29T05:00 | 2021-06-13T16:17:31.694 | None | 2020-07-29T05:00 |
df_competitions_flat.shape
(40, 12)
For our analysis, we only want to take the players that have played in the male competitions.
# Filter DataFrame for rows where 'competition_gender' is equal to 'male'
df_competitions_flat = df_competitions_flat.loc[(df_competitions_flat['competition_id'] == 55) &
(df_competitions_flat['season_id'] == 43)
]
df_competitions_flat
competition_id | season_id | country_name | competition_name | competition_gender | competition_youth | competition_international | season_name | match_updated | match_updated_360 | match_available_360 | match_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
38 | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
# Export DataFrame as a CSV file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions_sb_360.csv')):
df_competitions_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions_sb_360.csv'), index=None, header=True)
##
else:
pass
# ADD MARKDOWN TABLE OF DATA HERE
The following cell lists the competitions to be included in the dataset. Dataset includes data for seven different competitions - 5 domestic and 2 international.
# Define a list to select only the competitions of interest.
# Flatmap all Competition IDs to use all available competitions
lst_competitions = df_competitions_flat['competition_id'].unique().tolist()
"""
# Define list of competitions
lst_competitions = [2, # Premier League
11, # La Liga
16, # Champions League
#37, # FA Women's Super League
43, # FIFA World Cup
#49, # NWSL
#55, # UEFA Euro
#72, # Women's World Cup
]
"""
# Display list of competitions
lst_competitions
[55]
# Display the number of competitions
len(lst_competitions)
1
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'matches/*')))
['../../data/sb/raw/matches/matches.csv', '../../data/sb/raw/matches/matches_male.csv', '../../data/sb/raw/matches/matches_wc2018.csv']
Steps:
# Read in selected matches
## Read in exported CSV file if exists, if not, read in JSON file
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_sb_360.csv')):
### Create empty list for DataFrames
dfs_matches_all = []
### Loop through the selected competitions
for competition in lst_competitions:
### Create empty list for DataFrames
dfs_matches_competition = []
#### Show files in directory
lst_filepaths = list(glob.glob(data_dir_sb + '/open-data/data/matches/' + str(competition) + '/*'))
for filepath in lst_filepaths:
##### Open the JSON filepath with defined Competition and Season IDs
try:
###### Import all StatsBomb JSON Match data for the mens matches
with open(filepath) as f:
json_sb_match_data = json.load(f)
###### Flatten the JSON Match data
df_matches_flat = json_normalize(json_sb_match_data)
###### Append each Match data to
dfs_matches_competition.append(df_matches_flat)
## Concatenate DataFrames to one DataFrame
df_matches_competition = pd.concat(dfs_matches_competition)
#####
except:
pass
## Concatenate DataFrames to one DataFrame
dfs_matches_all.append(df_matches_competition)
## Concatenate DataFrames to one DataFrame
df_matches_flat = pd.concat(dfs_matches_all)
##
else:
df_matches_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_sb_360.csv'))
## Display DataFrame
df_matches_flat.head()
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:28: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead
match_id | match_date | kick_off | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition.competition_id | competition.country_name | competition.competition_name | season.season_id | season.season_name | home_team.home_team_id | home_team.home_team_name | home_team.home_team_gender | home_team.home_team_group | home_team.country.id | home_team.country.name | home_team.managers | away_team.away_team_id | away_team.away_team_name | away_team.away_team_gender | away_team.away_team_group | away_team.country.id | away_team.country.name | away_team.managers | metadata.data_version | metadata.shot_fidelity_version | metadata.xy_fidelity_version | competition_stage.id | competition_stage.name | stadium.id | stadium.name | stadium.country.id | stadium.country.name | referee.id | referee.name | referee.country.id | referee.country.name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3788753 | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands |
1 | 3788765 | 2021-06-20 | 18:00:00.000 | 3 | 1 | available | available | 2021-08-02T14:58:49.057 | 2021-11-11T13:54:37.507376 | 3 | 55 | Europe | UEFA Euro | 43 | 2020 | 773 | Switzerland | male | Group A | 221 | Switzerland | [{'id': 492, 'name': 'Vladimir Petković', 'nic... | 909 | Turkey | male | Group A | 233 | Turkey | [{'id': 701, 'name': 'Şenol Güneş', 'nickname'... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 943 | Slavko Vinčić | 208 | Slovenia |
2 | 3795107 | 2021-07-02 | 21:00:00.000 | 1 | 2 | available | available | 2021-07-19T12:41:55.898 | 2021-09-23T00:02:51.495862 | 5 | 55 | Europe | UEFA Euro | 43 | 2020 | 782 | Belgium | male | None | 22 | Belgium | [{'id': 263, 'name': 'Roberto Martínez Montoli... | 914 | Italy | male | None | 112 | Italy | [{'id': 2997, 'name': 'Roberto Mancini', 'nick... | 1.1.0 | 2 | 2 | 11 | Quarter-finals | 4867 | Allianz Arena (München) | 85 | Germany | 943 | Slavko Vinčić | 208 | Slovenia |
3 | 3795221 | 2021-07-07 | 21:00:00.000 | 2 | 1 | available | available | 2021-07-09T12:38:23.437 | 2021-09-22T22:33:37.494366 | 6 | 55 | Europe | UEFA Euro | 43 | 2020 | 768 | England | male | None | 68 | England | [{'id': 277, 'name': 'Gareth Southgate', 'nick... | 776 | Denmark | male | None | 61 | Denmark | [{'id': 255, 'name': 'Kasper Hjulmand', 'nickn... | 1.1.0 | 2 | 2 | 15 | Semi-finals | 4666 | Wembley Stadium (London) | 68 | England | 293 | Danny Desmond Makkelie | 160 | Netherlands |
4 | 3795506 | 2021-07-11 | 21:00:00.000 | 1 | 1 | available | available | 2021-07-12T12:27:50.647 | 2021-09-22T22:40:31.690550 | 7 | 55 | Europe | UEFA Euro | 43 | 2020 | 914 | Italy | male | None | 112 | Italy | [{'id': 2997, 'name': 'Roberto Mancini', 'nick... | 768 | England | male | None | 68 | England | [{'id': 277, 'name': 'Gareth Southgate', 'nick... | 1.1.0 | 2 | 2 | 26 | Final | 4666 | Wembley Stadium (London) | 68 | England | 287 | Björn Kuipers | 160 | Netherlands |
df_matches_flat.shape
(51, 42)
# Shot outcomes types and their frequency
df_matches_flat.groupby(['competition.competition_name', 'season.season_name']).match_id.count()
competition.competition_name season.season_name UEFA Euro 2020 51 Name: match_id, dtype: int64
There are 51 games in the UEFA Euro 2020 that can be used as part of the Expected Goals model.
match_id
column to list¶List used as reference of matches to parse for Events, Lineups, and Tactics data - iteration through list comprehension.
# Flatmap all Match IDs to use all available matches
lst_matches = df_matches_flat['match_id'].tolist()
# Display the number of matches
len(lst_matches)
51
# Export DataFrame as a CSV file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_sb_360.csv')):
df_matches_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_sb_360.csv'), index=None, header=True)
##
else:
pass
The StatsBomb dataset has one hundred and fourteen features (columns) with the following definitions and data types:
Feature | Data type |
---|---|
id |
object |
index |
object |
period |
object |
timestamp |
object |
minute |
object |
second |
object |
possession |
object |
duration |
object |
type.id |
object |
type.name |
object |
possession_team.id |
object |
possession_team.name |
object |
play_pattern.id |
object |
play_pattern.name |
object |
team.id |
object |
team.name |
object |
tactics.formation |
object |
tactics.lineup |
object |
related_events |
object |
location |
object |
player.id |
object |
player.name |
object |
position.id |
object |
position.name |
object |
pass.recipient.id |
object |
pass.recipient.name |
object |
pass.length |
object |
pass.angle |
object |
pass.height.id |
object |
pass.height.name |
object |
pass.end_location |
object |
pass.type.id |
object |
pass.type.name |
object |
pass.body_part.id |
object |
pass.body_part.name |
object |
carry.end_location |
object |
under_pressure |
object |
duel.type.id |
object |
duel.type.name |
object |
out |
object |
miscontrol.aerial_won |
object |
pass.outcome.id |
object |
pass.outcome.name |
object |
ball_receipt.outcome.id |
object |
ball_receipt.outcome.name |
object |
pass.aerial_won |
object |
counterpress |
object |
off_camera |
object |
dribble.outcome.id |
object |
dribble.outcome.name |
object |
dribble.overrun |
object |
ball_recovery.offensive |
object |
shot.statsbomb_xg |
object |
shot.end_location |
object |
shot.outcome.id |
object |
shot.outcome.name |
object |
shot.type.id |
object |
shot.type.name |
object |
shot.body_part.id |
object |
shot.body_part.name |
object |
shot.technique.id |
object |
shot.technique.name |
object |
shot.freeze_frame |
object |
goalkeeper.end_location |
object |
goalkeeper.type.id |
object |
goalkeeper.type.name |
object |
goalkeeper.position.id |
object |
goalkeeper.position.name |
object |
pass.straight |
object |
pass.technique.id |
object |
pass.technique.name |
object |
clearance.head |
object |
clearance.body_part.id |
object |
clearance.body_part.name |
object |
pass.switch |
object |
duel.outcome.id |
object |
duel.outcome.name |
object |
foul_committed.advantage |
object |
foul_won.advantage |
object |
pass.cross |
object |
pass.assisted_shot_id |
object |
pass.shot_assist |
object |
shot.one_on_one |
object |
shot.key_pass_id |
object |
goalkeeper.body_part.id |
object |
goalkeeper.body_part.name |
object |
goalkeeper.technique.id |
object |
goalkeeper.technique.name |
object |
goalkeeper.outcome.id |
object |
goalkeeper.outcome.name |
object |
clearance.aerial_won |
object |
foul_committed.card.id |
object |
foul_committed.card.name |
object |
foul_won.defensive |
object |
clearance.right_foot |
object |
shot.first_time |
object |
pass.through_ball |
object |
interception.outcome.id |
object |
interception.outcome.name |
object |
clearance.left_foot |
object |
ball_recovery.recovery_failure |
object |
shot.aerial_won |
object |
pass.goal_assist |
object |
pass.cut_back |
object |
pass.deflected |
object |
clearance.other |
object |
pass.outswinging |
object |
substitution.outcome.id |
object |
substitution.outcome.name |
object |
substitution.replacement.id |
object |
substitution.replacement.name |
object |
block.deflection |
object |
block.offensive |
object |
injury_stoppage.in_chain |
object |
For a full list of definitions, see the official documentation [link].
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'events/*')))
['../../data/sb/raw/events/events_male.csv', '../../data/sb/raw/events/events_wc2018.csv']
Steps:
# Read in exported CSV file if exists, if not, read in JSON file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'events', 'events_sb_360.csv')):
### Create empty list for DataFrames
dfs_events = []
### Loop through event files for the selected matches and append DataFrame to dfs_events list
for match_id in lst_matches:
####
with open(data_dir_sb + '/open-data/data/events/' + str(match_id) + '.json') as f:
event = json.load(f)
#match_id = str(match_id)
df_event_flat = json_normalize(event)
df_event_flat['match_id'] = match_id
dfs_events.append(df_event_flat)
### Concatenate DataFrames to one DataFrame
df_events = pd.concat(dfs_events)
### Flatten the nested columns
df_events_flat = flatten_nested_json_df(df_events)
##
else:
df_events_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'events', 'events_sb_360.csv'))
## Display DataFrame
df_events_flat.head()
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead app.launch_new_instance()
original shape: (192686, 144) original columns: Index(['level_0', 'id', 'index', 'period', 'timestamp', 'minute', 'second', 'possession', 'duration', 'type.id', ... 'goalkeeper.shot_saved_to_post', 'shot.open_goal', 'goalkeeper.penalty_saved_to_post', 'dribble.no_touch', 'block.offensive', 'shot.follows_dribble', 'ball_recovery.offensive', 'shot.redirect', 'goalkeeper.lost_in_play', 'goalkeeper.success_in_play'], dtype='object', length=144) lists: [], dicts: [] final shape: (192686, 144) final columns: Index(['level_0', 'id', 'index', 'period', 'timestamp', 'minute', 'second', 'possession', 'duration', 'type.id', ... 'goalkeeper.shot_saved_to_post', 'shot.open_goal', 'goalkeeper.penalty_saved_to_post', 'dribble.no_touch', 'block.offensive', 'shot.follows_dribble', 'ball_recovery.offensive', 'shot.redirect', 'goalkeeper.lost_in_play', 'goalkeeper.success_in_play'], dtype='object', length=144)
level_0 | id | index | period | timestamp | minute | second | possession | duration | type.id | type.name | possession_team.id | possession_team.name | play_pattern.id | play_pattern.name | team.id | team.name | tactics.formation | tactics.lineup | related_events | location | player.id | player.name | position.id | position.name | pass.recipient.id | pass.recipient.name | pass.length | pass.angle | pass.height.id | pass.height.name | pass.end_location | pass.body_part.id | pass.body_part.name | pass.type.id | pass.type.name | carry.end_location | under_pressure | duel.type.id | duel.type.name | pass.aerial_won | counterpress | duel.outcome.id | duel.outcome.name | dribble.outcome.id | dribble.outcome.name | pass.outcome.id | pass.outcome.name | ball_receipt.outcome.id | ball_receipt.outcome.name | interception.outcome.id | interception.outcome.name | shot.statsbomb_xg | shot.end_location | shot.outcome.id | shot.outcome.name | shot.type.id | shot.type.name | shot.body_part.id | shot.body_part.name | shot.technique.id | shot.technique.name | shot.freeze_frame | goalkeeper.end_location | goalkeeper.type.id | goalkeeper.type.name | goalkeeper.position.id | goalkeeper.position.name | out | pass.outswinging | pass.technique.id | pass.technique.name | clearance.head | clearance.body_part.id | clearance.body_part.name | pass.switch | off_camera | pass.cross | clearance.left_foot | dribble.overrun | dribble.nutmeg | clearance.right_foot | pass.no_touch | foul_committed.advantage | foul_won.advantage | pass.assisted_shot_id | pass.shot_assist | shot.key_pass_id | shot.first_time | clearance.other | pass.miscommunication | clearance.aerial_won | pass.through_ball | ball_recovery.recovery_failure | goalkeeper.outcome.id | goalkeeper.outcome.name | goalkeeper.body_part.id | goalkeeper.body_part.name | shot.aerial_won | foul_committed.card.id | foul_committed.card.name | foul_committed.offensive | foul_won.defensive | substitution.outcome.id | substitution.outcome.name | substitution.replacement.id | substitution.replacement.name | 50_50.outcome.id | 50_50.outcome.name | pass.goal_assist | goalkeeper.technique.id | goalkeeper.technique.name | pass.cut_back | miscontrol.aerial_won | pass.straight | foul_committed.type.id | foul_committed.type.name | match_id | pass.inswinging | pass.deflected | injury_stoppage.in_chain | shot.one_on_one | bad_behaviour.card.id | bad_behaviour.card.name | shot.deflected | block.deflection | foul_committed.penalty | foul_won.penalty | block.save_block | goalkeeper.punched_out | player_off.permanent | shot.saved_off_target | goalkeeper.shot_saved_off_target | shot.saved_to_post | goalkeeper.shot_saved_to_post | shot.open_goal | goalkeeper.penalty_saved_to_post | dribble.no_touch | block.offensive | shot.follows_dribble | ball_recovery.offensive | shot.redirect | goalkeeper.lost_in_play | goalkeeper.success_in_play | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 9427b18a-6b10-411f-90da-3d6240b80c71 | 1 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 1835 | Finland | 352.0 | [{'player': {'id': 8667, 'name': 'Lukáš Hrádec... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1 | 542c58bf-5c6c-43ca-9d8d-e086c7f08aaf | 2 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 796 | Russia | 3421.0 | [{'player': {'id': 21298, 'name': 'Matvey Safo... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 2 | a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33 | 3 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 1835 | Finland | NaN | NaN | [c7156352-f4b7-4140-aa51-6e26fd019a11] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 3 | c7156352-f4b7-4140-aa51-6e26fd019a11 | 4 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 796 | Russia | NaN | NaN | [a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 4 | 94dbc5c3-ef37-445e-9154-3d9f9ea9245d | 5 | 1 | 00:00:00.490 | 0 | 0 | 2 | 1.373215 | 30 | Pass | 796 | Russia | 9 | From Kick Off | 796 | Russia | NaN | NaN | [c0935bbe-3eb4-4a21-9eee-45f380d1f26d] | [60.0, 40.0] | 6299.0 | Aleksey Miranchuk | 18.0 | Right Attacking Midfield | 31917.0 | Igor Diveev | 22.357325 | 3.069967 | 1.0 | Ground Pass | [37.7, 41.6] | 38.0 | Left Foot | 65.0 | Kick Off | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df_events_flat.shape
(192686, 144)
# Export DataFrame as a CSV file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'events', 'events_sb_360.csv')):
df_events_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'events', 'events_sb_360.csv'), index=None, header=True)
##
else:
pass
The final step of the data parsing is to join the Matches
DataFrame and the Competition
DataFrames to the Events
DataFrame. The Events
data is the base DataFrame in which we join the other tables via match_id
and competition.competition_id
.
# Read in exported CSV file if exists, if not, merge the individual DataFrames
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_sb_360.csv')):
# Join the Matches DataFrame to the Events DataFrame
df_events_matches = pd.merge(df_events_flat, df_matches_flat, left_on=['match_id'], right_on=['match_id'])
# Join the Competitions DataFrame to the Events-Matches DataFrame
df_events_matches_competitions = pd.merge(df_events_matches, df_competitions_flat, left_on=['competition.competition_id', 'season.season_id'], right_on=['competition_id', 'season_id'])
else:
df_events_matches_competitions = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_sb_360.csv'))
# Display DataFrame
df_events_matches_competitions.head()
level_0 | id | index | period | timestamp | minute | second | possession | duration | type.id | type.name | possession_team.id | possession_team.name | play_pattern.id | play_pattern.name | team.id | team.name | tactics.formation | tactics.lineup | related_events | location | player.id | player.name | position.id | position.name | pass.recipient.id | pass.recipient.name | pass.length | pass.angle | pass.height.id | pass.height.name | pass.end_location | pass.body_part.id | pass.body_part.name | pass.type.id | pass.type.name | carry.end_location | under_pressure | duel.type.id | duel.type.name | pass.aerial_won | counterpress | duel.outcome.id | duel.outcome.name | dribble.outcome.id | dribble.outcome.name | pass.outcome.id | pass.outcome.name | ball_receipt.outcome.id | ball_receipt.outcome.name | interception.outcome.id | interception.outcome.name | shot.statsbomb_xg | shot.end_location | shot.outcome.id | shot.outcome.name | shot.type.id | shot.type.name | shot.body_part.id | shot.body_part.name | shot.technique.id | shot.technique.name | shot.freeze_frame | goalkeeper.end_location | goalkeeper.type.id | goalkeeper.type.name | goalkeeper.position.id | goalkeeper.position.name | out | pass.outswinging | pass.technique.id | pass.technique.name | clearance.head | clearance.body_part.id | clearance.body_part.name | pass.switch | off_camera | pass.cross | clearance.left_foot | dribble.overrun | dribble.nutmeg | clearance.right_foot | pass.no_touch | foul_committed.advantage | foul_won.advantage | pass.assisted_shot_id | pass.shot_assist | shot.key_pass_id | shot.first_time | clearance.other | pass.miscommunication | clearance.aerial_won | pass.through_ball | ball_recovery.recovery_failure | goalkeeper.outcome.id | goalkeeper.outcome.name | goalkeeper.body_part.id | goalkeeper.body_part.name | shot.aerial_won | foul_committed.card.id | foul_committed.card.name | foul_committed.offensive | foul_won.defensive | substitution.outcome.id | substitution.outcome.name | substitution.replacement.id | substitution.replacement.name | 50_50.outcome.id | 50_50.outcome.name | pass.goal_assist | goalkeeper.technique.id | goalkeeper.technique.name | pass.cut_back | miscontrol.aerial_won | pass.straight | foul_committed.type.id | foul_committed.type.name | match_id | pass.inswinging | pass.deflected | injury_stoppage.in_chain | shot.one_on_one | bad_behaviour.card.id | bad_behaviour.card.name | shot.deflected | block.deflection | foul_committed.penalty | foul_won.penalty | block.save_block | goalkeeper.punched_out | player_off.permanent | shot.saved_off_target | goalkeeper.shot_saved_off_target | shot.saved_to_post | goalkeeper.shot_saved_to_post | shot.open_goal | goalkeeper.penalty_saved_to_post | dribble.no_touch | block.offensive | shot.follows_dribble | ball_recovery.offensive | shot.redirect | goalkeeper.lost_in_play | goalkeeper.success_in_play | match_date | kick_off | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition.competition_id | competition.country_name | competition.competition_name | season.season_id | season.season_name | home_team.home_team_id | home_team.home_team_name | home_team.home_team_gender | home_team.home_team_group | home_team.country.id | home_team.country.name | home_team.managers | away_team.away_team_id | away_team.away_team_name | away_team.away_team_gender | away_team.away_team_group | away_team.country.id | away_team.country.name | away_team.managers | metadata.data_version | metadata.shot_fidelity_version | metadata.xy_fidelity_version | competition_stage.id | competition_stage.name | stadium.id | stadium.name | stadium.country.id | stadium.country.name | referee.id | referee.name | referee.country.id | referee.country.name | competition_id | season_id | country_name | competition_name | competition_gender | competition_youth | competition_international | season_name | match_updated | match_updated_360 | match_available_360 | match_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 9427b18a-6b10-411f-90da-3d6240b80c71 | 1 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 1835 | Finland | 352.0 | [{'player': {'id': 8667, 'name': 'Lukáš Hrádec... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
1 | 1 | 542c58bf-5c6c-43ca-9d8d-e086c7f08aaf | 2 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 796 | Russia | 3421.0 | [{'player': {'id': 21298, 'name': 'Matvey Safo... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
2 | 2 | a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33 | 3 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 1835 | Finland | NaN | NaN | [c7156352-f4b7-4140-aa51-6e26fd019a11] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
3 | 3 | c7156352-f4b7-4140-aa51-6e26fd019a11 | 4 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 796 | Russia | NaN | NaN | [a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
4 | 4 | 94dbc5c3-ef37-445e-9154-3d9f9ea9245d | 5 | 1 | 00:00:00.490 | 0 | 0 | 2 | 1.373215 | 30 | Pass | 796 | Russia | 9 | From Kick Off | 796 | Russia | NaN | NaN | [c0935bbe-3eb4-4a21-9eee-45f380d1f26d] | [60.0, 40.0] | 6299.0 | Aleksey Miranchuk | 18.0 | Right Attacking Midfield | 31917.0 | Igor Diveev | 22.357325 | 3.069967 | 1.0 | Ground Pass | [37.7, 41.6] | 38.0 | Left Foot | 65.0 | Kick Off | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
print('No. rows in Events DataFrame BEFORE join to Matches and Competitions DataFrames: {}'.format(len(df_events_flat)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_events_matches_competitions)))
print('-'*10+'\n')
print('Variance in rows before and after join: {}\n'.format(len(df_events_matches_competitions) - len(df_events_flat)))
No. rows in Events DataFrame BEFORE join to Matches and Competitions DataFrames: 192686 No. rows in DataFrame AFTER join: 192686 ---------- Variance in rows before and after join: 0
# Export DataFrame as a CSV file
##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_sb_360.csv')):
df_events_matches_competitions.to_csv(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_sb_360.csv'), index=None, header=True)
##
else:
pass
Let's quality of the dataset by looking first and last rows in pandas using the head() and tail() methods.
Initial step of the data handling and Exploratory Data Analysis (EDA) is to create a quick summary report of the dataset using pandas Profiling Report.
# Summary of the data using pandas Profiling Report
#pp.ProfileReport(df_events_matches_competitions)
The following commands go into more bespoke summary of the dataset. Some of the commands include content covered in the pandas Profiling summary above, but using the standard pandas functions and methods that most peoplem will be more familiar with.
First check the quality of the dataset by looking first and last rows in pandas using the head() and tail() methods.
# Display the first five rows of the DataFrame, df_events_matches_competitions
df_events_matches_competitions.head()
level_0 | id | index | period | timestamp | minute | second | possession | duration | type.id | type.name | possession_team.id | possession_team.name | play_pattern.id | play_pattern.name | team.id | team.name | tactics.formation | tactics.lineup | related_events | location | player.id | player.name | position.id | position.name | pass.recipient.id | pass.recipient.name | pass.length | pass.angle | pass.height.id | pass.height.name | pass.end_location | pass.body_part.id | pass.body_part.name | pass.type.id | pass.type.name | carry.end_location | under_pressure | duel.type.id | duel.type.name | pass.aerial_won | counterpress | duel.outcome.id | duel.outcome.name | dribble.outcome.id | dribble.outcome.name | pass.outcome.id | pass.outcome.name | ball_receipt.outcome.id | ball_receipt.outcome.name | interception.outcome.id | interception.outcome.name | shot.statsbomb_xg | shot.end_location | shot.outcome.id | shot.outcome.name | shot.type.id | shot.type.name | shot.body_part.id | shot.body_part.name | shot.technique.id | shot.technique.name | shot.freeze_frame | goalkeeper.end_location | goalkeeper.type.id | goalkeeper.type.name | goalkeeper.position.id | goalkeeper.position.name | out | pass.outswinging | pass.technique.id | pass.technique.name | clearance.head | clearance.body_part.id | clearance.body_part.name | pass.switch | off_camera | pass.cross | clearance.left_foot | dribble.overrun | dribble.nutmeg | clearance.right_foot | pass.no_touch | foul_committed.advantage | foul_won.advantage | pass.assisted_shot_id | pass.shot_assist | shot.key_pass_id | shot.first_time | clearance.other | pass.miscommunication | clearance.aerial_won | pass.through_ball | ball_recovery.recovery_failure | goalkeeper.outcome.id | goalkeeper.outcome.name | goalkeeper.body_part.id | goalkeeper.body_part.name | shot.aerial_won | foul_committed.card.id | foul_committed.card.name | foul_committed.offensive | foul_won.defensive | substitution.outcome.id | substitution.outcome.name | substitution.replacement.id | substitution.replacement.name | 50_50.outcome.id | 50_50.outcome.name | pass.goal_assist | goalkeeper.technique.id | goalkeeper.technique.name | pass.cut_back | miscontrol.aerial_won | pass.straight | foul_committed.type.id | foul_committed.type.name | match_id | pass.inswinging | pass.deflected | injury_stoppage.in_chain | shot.one_on_one | bad_behaviour.card.id | bad_behaviour.card.name | shot.deflected | block.deflection | foul_committed.penalty | foul_won.penalty | block.save_block | goalkeeper.punched_out | player_off.permanent | shot.saved_off_target | goalkeeper.shot_saved_off_target | shot.saved_to_post | goalkeeper.shot_saved_to_post | shot.open_goal | goalkeeper.penalty_saved_to_post | dribble.no_touch | block.offensive | shot.follows_dribble | ball_recovery.offensive | shot.redirect | goalkeeper.lost_in_play | goalkeeper.success_in_play | match_date | kick_off | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition.competition_id | competition.country_name | competition.competition_name | season.season_id | season.season_name | home_team.home_team_id | home_team.home_team_name | home_team.home_team_gender | home_team.home_team_group | home_team.country.id | home_team.country.name | home_team.managers | away_team.away_team_id | away_team.away_team_name | away_team.away_team_gender | away_team.away_team_group | away_team.country.id | away_team.country.name | away_team.managers | metadata.data_version | metadata.shot_fidelity_version | metadata.xy_fidelity_version | competition_stage.id | competition_stage.name | stadium.id | stadium.name | stadium.country.id | stadium.country.name | referee.id | referee.name | referee.country.id | referee.country.name | competition_id | season_id | country_name | competition_name | competition_gender | competition_youth | competition_international | season_name | match_updated | match_updated_360 | match_available_360 | match_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 9427b18a-6b10-411f-90da-3d6240b80c71 | 1 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 1835 | Finland | 352.0 | [{'player': {'id': 8667, 'name': 'Lukáš Hrádec... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
1 | 1 | 542c58bf-5c6c-43ca-9d8d-e086c7f08aaf | 2 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 35 | Starting XI | 1835 | Finland | 1 | Regular Play | 796 | Russia | 3421.0 | [{'player': {'id': 21298, 'name': 'Matvey Safo... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
2 | 2 | a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33 | 3 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 1835 | Finland | NaN | NaN | [c7156352-f4b7-4140-aa51-6e26fd019a11] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
3 | 3 | c7156352-f4b7-4140-aa51-6e26fd019a11 | 4 | 1 | 00:00:00.000 | 0 | 0 | 1 | 0.000000 | 18 | Half Start | 1835 | Finland | 1 | Regular Play | 796 | Russia | NaN | NaN | [a0dfe8a0-a0b9-443e-89e3-a8ba6596fa33] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
4 | 4 | 94dbc5c3-ef37-445e-9154-3d9f9ea9245d | 5 | 1 | 00:00:00.490 | 0 | 0 | 2 | 1.373215 | 30 | Pass | 796 | Russia | 9 | From Kick Off | 796 | Russia | NaN | NaN | [c0935bbe-3eb4-4a21-9eee-45f380d1f26d] | [60.0, 40.0] | 6299.0 | Aleksey Miranchuk | 18.0 | Right Attacking Midfield | 31917.0 | Igor Diveev | 22.357325 | 3.069967 | 1.0 | Ground Pass | [37.7, 41.6] | 38.0 | Left Foot | 65.0 | Kick Off | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788753 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-16 | 15:00:00.000 | 0 | 1 | available | available | 2021-11-11T14:00:16.105809 | 2021-09-22T16:39:05.697512 | 2 | 55 | Europe | UEFA Euro | 43 | 2020 | 1835 | Finland | male | Group B | 77 | Finland | [{'id': 3622, 'name': 'Markku Kanerva', 'nickn... | 796 | Russia | male | Group B | 188 | Russia | [{'id': 365, 'name': 'Stanislav Cherchesov', '... | 1.1.0 | 2 | 2 | 10 | Group Stage | 4726 | Saint-Petersburg Stadium | 188 | Russia | 293 | Danny Desmond Makkelie | 160 | Netherlands | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
# Display the last five rows of the DataFrame, df_events_matches_competitions
df_events_matches_competitions.tail()
level_0 | id | index | period | timestamp | minute | second | possession | duration | type.id | type.name | possession_team.id | possession_team.name | play_pattern.id | play_pattern.name | team.id | team.name | tactics.formation | tactics.lineup | related_events | location | player.id | player.name | position.id | position.name | pass.recipient.id | pass.recipient.name | pass.length | pass.angle | pass.height.id | pass.height.name | pass.end_location | pass.body_part.id | pass.body_part.name | pass.type.id | pass.type.name | carry.end_location | under_pressure | duel.type.id | duel.type.name | pass.aerial_won | counterpress | duel.outcome.id | duel.outcome.name | dribble.outcome.id | dribble.outcome.name | pass.outcome.id | pass.outcome.name | ball_receipt.outcome.id | ball_receipt.outcome.name | interception.outcome.id | interception.outcome.name | shot.statsbomb_xg | shot.end_location | shot.outcome.id | shot.outcome.name | shot.type.id | shot.type.name | shot.body_part.id | shot.body_part.name | shot.technique.id | shot.technique.name | shot.freeze_frame | goalkeeper.end_location | goalkeeper.type.id | goalkeeper.type.name | goalkeeper.position.id | goalkeeper.position.name | out | pass.outswinging | pass.technique.id | pass.technique.name | clearance.head | clearance.body_part.id | clearance.body_part.name | pass.switch | off_camera | pass.cross | clearance.left_foot | dribble.overrun | dribble.nutmeg | clearance.right_foot | pass.no_touch | foul_committed.advantage | foul_won.advantage | pass.assisted_shot_id | pass.shot_assist | shot.key_pass_id | shot.first_time | clearance.other | pass.miscommunication | clearance.aerial_won | pass.through_ball | ball_recovery.recovery_failure | goalkeeper.outcome.id | goalkeeper.outcome.name | goalkeeper.body_part.id | goalkeeper.body_part.name | shot.aerial_won | foul_committed.card.id | foul_committed.card.name | foul_committed.offensive | foul_won.defensive | substitution.outcome.id | substitution.outcome.name | substitution.replacement.id | substitution.replacement.name | 50_50.outcome.id | 50_50.outcome.name | pass.goal_assist | goalkeeper.technique.id | goalkeeper.technique.name | pass.cut_back | miscontrol.aerial_won | pass.straight | foul_committed.type.id | foul_committed.type.name | match_id | pass.inswinging | pass.deflected | injury_stoppage.in_chain | shot.one_on_one | bad_behaviour.card.id | bad_behaviour.card.name | shot.deflected | block.deflection | foul_committed.penalty | foul_won.penalty | block.save_block | goalkeeper.punched_out | player_off.permanent | shot.saved_off_target | goalkeeper.shot_saved_off_target | shot.saved_to_post | goalkeeper.shot_saved_to_post | shot.open_goal | goalkeeper.penalty_saved_to_post | dribble.no_touch | block.offensive | shot.follows_dribble | ball_recovery.offensive | shot.redirect | goalkeeper.lost_in_play | goalkeeper.success_in_play | match_date | kick_off | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition.competition_id | competition.country_name | competition.competition_name | season.season_id | season.season_name | home_team.home_team_id | home_team.home_team_name | home_team.home_team_gender | home_team.home_team_group | home_team.country.id | home_team.country.name | home_team.managers | away_team.away_team_id | away_team.away_team_name | away_team.away_team_gender | away_team.away_team_group | away_team.country.id | away_team.country.name | away_team.managers | metadata.data_version | metadata.shot_fidelity_version | metadata.xy_fidelity_version | competition_stage.id | competition_stage.name | stadium.id | stadium.name | stadium.country.id | stadium.country.name | referee.id | referee.name | referee.country.id | referee.country.name | competition_id | season_id | country_name | competition_name | competition_gender | competition_youth | competition_international | season_name | match_updated | match_updated_360 | match_available_360 | match_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
192681 | 2992 | 8aa68a18-79d2-4b8b-ba5f-9d3124f1cd20 | 2993 | 2 | 00:50:23.150 | 95 | 23 | 151 | 0.87093 | 30 | Pass | 907 | Wales | 4 | From Throw In | 907 | Wales | NaN | NaN | [a0bdd045-65d0-4601-9378-a5dacdba3257, dcbfc5e... | [110.7, 0.1] | 3086.0 | Ben Davies | 6.0 | Left Back | 6399.0 | Gareth Frank Bale | 6.103278 | 1.254227 | 2.0 | Low Pass | [112.6, 5.9] | NaN | NaN | 67.0 | Throw-in | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9.0 | Incomplete | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-12 | 15:00:00.000 | 1 | 1 | available | available | 2021-06-20T12:57:59.258 | 2021-09-22T16:38:18.433799 | 1 | 55 | Europe | UEFA Euro | 43 | 2020 | 907 | Wales | male | Group A | 249 | Wales | NaN | 773 | Switzerland | male | Group A | 221 | Switzerland | NaN | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 76 | Clément Turpin | 78 | France | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
192682 | 2993 | a0bdd045-65d0-4601-9378-a5dacdba3257 | 2994 | 2 | 00:50:24.021 | 95 | 24 | 151 | NaN | 42 | Ball Receipt* | 907 | Wales | 4 | From Throw In | 907 | Wales | NaN | NaN | [8aa68a18-79d2-4b8b-ba5f-9d3124f1cd20] | [113.7, 8.0] | 6399.0 | Gareth Frank Bale | 16.0 | Left Midfield | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9.0 | Incomplete | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-12 | 15:00:00.000 | 1 | 1 | available | available | 2021-06-20T12:57:59.258 | 2021-09-22T16:38:18.433799 | 1 | 55 | Europe | UEFA Euro | 43 | 2020 | 907 | Wales | male | Group A | 249 | Wales | NaN | 773 | Switzerland | male | Group A | 221 | Switzerland | NaN | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 76 | Clément Turpin | 78 | France | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
192683 | 2994 | dcbfc5e1-b22f-460f-8132-cf6fe7a590f8 | 2995 | 2 | 00:50:24.021 | 95 | 24 | 151 | 0.00000 | 10 | Interception | 907 | Wales | 4 | From Throw In | 773 | Switzerland | NaN | NaN | [8aa68a18-79d2-4b8b-ba5f-9d3124f1cd20] | [7.5, 74.2] | 8814.0 | Nico Elvedi | 3.0 | Right Center Back | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4.0 | Won | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-12 | 15:00:00.000 | 1 | 1 | available | available | 2021-06-20T12:57:59.258 | 2021-09-22T16:38:18.433799 | 1 | 55 | Europe | UEFA Euro | 43 | 2020 | 907 | Wales | male | Group A | 249 | Wales | NaN | 773 | Switzerland | male | Group A | 221 | Switzerland | NaN | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 76 | Clément Turpin | 78 | France | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
192684 | 2995 | 97cb188e-82e3-49cb-9ccf-9a5346f042b1 | 2996 | 2 | 00:50:25.404 | 95 | 25 | 151 | 0.00000 | 34 | Half End | 907 | Wales | 4 | From Throw In | 773 | Switzerland | NaN | NaN | [edfa93d5-a744-41a4-a10d-f1d09017011d] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-12 | 15:00:00.000 | 1 | 1 | available | available | 2021-06-20T12:57:59.258 | 2021-09-22T16:38:18.433799 | 1 | 55 | Europe | UEFA Euro | 43 | 2020 | 907 | Wales | male | Group A | 249 | Wales | NaN | 773 | Switzerland | male | Group A | 221 | Switzerland | NaN | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 76 | Clément Turpin | 78 | France | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
192685 | 2996 | edfa93d5-a744-41a4-a10d-f1d09017011d | 2997 | 2 | 00:50:25.404 | 95 | 25 | 151 | 0.00000 | 34 | Half End | 907 | Wales | 4 | From Throw In | 907 | Wales | NaN | NaN | [97cb188e-82e3-49cb-9ccf-9a5346f042b1] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3788744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021-06-12 | 15:00:00.000 | 1 | 1 | available | available | 2021-06-20T12:57:59.258 | 2021-09-22T16:38:18.433799 | 1 | 55 | Europe | UEFA Euro | 43 | 2020 | 907 | Wales | male | Group A | 249 | Wales | NaN | 773 | Switzerland | male | Group A | 221 | Switzerland | NaN | 1.1.0 | 2 | 2 | 10 | Group Stage | 4549 | Bakı Olimpiya Stadionu | 16 | Azerbaijan | 76 | Clément Turpin | 78 | France | 55 | 43 | Europe | UEFA Euro | male | False | True | 2020 | 2021-11-11T14:00:16.105809 | 2021-11-11T13:54:37.507376 | 2021-11-11T13:54:37.507376 | 2021-11-11T14:00:16.105809 |
# Print the shape of the DataFrame, df_events_matches_competitions
print(df_events_matches_competitions.shape)
(192686, 197)
# Print the column names of the DataFrame, df_events_matches_competitions
print(df_events_matches_competitions.columns)
Index(['level_0', 'id', 'index', 'period', 'timestamp', 'minute', 'second', 'possession', 'duration', 'type.id', ... 'country_name', 'competition_name', 'competition_gender', 'competition_youth', 'competition_international', 'season_name', 'match_updated', 'match_updated_360', 'match_available_360', 'match_available'], dtype='object', length=197)
# Data types of the features of the raw DataFrame, df_events_matches_competitions
df_events_matches_competitions.dtypes
level_0 int64 id object index int64 period int64 timestamp object ... season_name object match_updated object match_updated_360 object match_available_360 object match_available object Length: 197, dtype: object
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df_events_matches_competitions.dtypes)
level_0 int64 id object index int64 period int64 timestamp object minute int64 second int64 possession int64 duration float64 type.id int64 type.name object possession_team.id int64 possession_team.name object play_pattern.id int64 play_pattern.name object team.id int64 team.name object tactics.formation float64 tactics.lineup object related_events object location object player.id float64 player.name object position.id float64 position.name object pass.recipient.id float64 pass.recipient.name object pass.length float64 pass.angle float64 pass.height.id float64 pass.height.name object pass.end_location object pass.body_part.id float64 pass.body_part.name object pass.type.id float64 pass.type.name object carry.end_location object under_pressure object duel.type.id float64 duel.type.name object pass.aerial_won object counterpress object duel.outcome.id float64 duel.outcome.name object dribble.outcome.id float64 dribble.outcome.name object pass.outcome.id float64 pass.outcome.name object ball_receipt.outcome.id float64 ball_receipt.outcome.name object interception.outcome.id float64 interception.outcome.name object shot.statsbomb_xg float64 shot.end_location object shot.outcome.id float64 shot.outcome.name object shot.type.id float64 shot.type.name object shot.body_part.id float64 shot.body_part.name object shot.technique.id float64 shot.technique.name object shot.freeze_frame object goalkeeper.end_location object goalkeeper.type.id float64 goalkeeper.type.name object goalkeeper.position.id float64 goalkeeper.position.name object out object pass.outswinging object pass.technique.id float64 pass.technique.name object clearance.head object clearance.body_part.id float64 clearance.body_part.name object pass.switch object off_camera object pass.cross object clearance.left_foot object dribble.overrun object dribble.nutmeg object clearance.right_foot object pass.no_touch object foul_committed.advantage object foul_won.advantage object pass.assisted_shot_id object pass.shot_assist object shot.key_pass_id object shot.first_time object clearance.other object pass.miscommunication object clearance.aerial_won object pass.through_ball object ball_recovery.recovery_failure object goalkeeper.outcome.id float64 goalkeeper.outcome.name object goalkeeper.body_part.id float64 goalkeeper.body_part.name object shot.aerial_won object foul_committed.card.id float64 foul_committed.card.name object foul_committed.offensive object foul_won.defensive object substitution.outcome.id float64 substitution.outcome.name object substitution.replacement.id float64 substitution.replacement.name object 50_50.outcome.id float64 50_50.outcome.name object pass.goal_assist object goalkeeper.technique.id float64 goalkeeper.technique.name object pass.cut_back object miscontrol.aerial_won object pass.straight object foul_committed.type.id float64 foul_committed.type.name object match_id int64 pass.inswinging object pass.deflected object injury_stoppage.in_chain object shot.one_on_one object bad_behaviour.card.id float64 bad_behaviour.card.name object shot.deflected object block.deflection object foul_committed.penalty object foul_won.penalty object block.save_block object goalkeeper.punched_out object player_off.permanent object shot.saved_off_target object goalkeeper.shot_saved_off_target object shot.saved_to_post object goalkeeper.shot_saved_to_post object shot.open_goal object goalkeeper.penalty_saved_to_post object dribble.no_touch object block.offensive object shot.follows_dribble object ball_recovery.offensive object shot.redirect object goalkeeper.lost_in_play object goalkeeper.success_in_play object match_date object kick_off object home_score int64 away_score int64 match_status object match_status_360 object last_updated object last_updated_360 object match_week int64 competition.competition_id int64 competition.country_name object competition.competition_name object season.season_id int64 season.season_name object home_team.home_team_id int64 home_team.home_team_name object home_team.home_team_gender object home_team.home_team_group object home_team.country.id int64 home_team.country.name object home_team.managers object away_team.away_team_id int64 away_team.away_team_name object away_team.away_team_gender object away_team.away_team_group object away_team.country.id int64 away_team.country.name object away_team.managers object metadata.data_version object metadata.shot_fidelity_version object metadata.xy_fidelity_version object competition_stage.id int64 competition_stage.name object stadium.id int64 stadium.name object stadium.country.id int64 stadium.country.name object referee.id int64 referee.name object referee.country.id int64 referee.country.name object competition_id int64 season_id int64 country_name object competition_name object competition_gender object competition_youth bool competition_international bool season_name object match_updated object match_updated_360 object match_available_360 object match_available object dtype: object
Full details of these attributes and their data types can be found in the Data Dictionary.
# Counts of missing values
null_value_stats = df_events_matches_competitions.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]
duration 52722 tactics.formation 192459 tactics.lineup 192459 related_events 7079 location 1530 ... goalkeeper.success_in_play 192684 home_team.home_team_group 63094 home_team.managers 12516 away_team.away_team_group 63094 away_team.managers 12516 Length: 131, dtype: int64
This notebook parses JSON data from the StatsBomb Open Data GitHub repository using pandas.
The next stage is to engineer this DataFrame.
*Visit my website eddwebster.com or my GitHub Repository for more projects. If you'd like to get in contact, my Twitter handle is @eddwebster and my email is: edd.j.webster@gmail.com.*