#!/usr/bin/env python
# coding: utf-8
#
# # Physical Data Engineering, Part 2
# ##### Notebook to engineer the second of two provided datasets of physical data by [Watford F.C](https://www.watfordfc.com/), using [Python](https://www.python.org/) [pandas](http://pandas.pydata.org/).
#
# ### By [Edd Webster](https://www.twitter.com/eddwebster)
# Notebook first written: 11/02/2022
# Notebook last updated: 12/02/2022
#
# ![Watford F.C.](../../img/club_badges/premier_league/watford_fc_logo_small.png)
#
# Click [here](#section4) to jump straight into the Data Engineering section and skip the [Notebook Brief](#section2) and [Data Sources](#section3) sections.
# ___
#
#
# ## Introduction
# This notebook engineers a second of two provided datasets of physical data by [Watford F.C](https://www.watfordfc.com/), using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames.
#
# For more information about this notebook and the author, I am available through all the following channels:
# * [eddwebster.com](https://www.eddwebster.com/);
# * edd.j.webster@gmail.com;
# * [@eddwebster](https://www.twitter.com/eddwebster);
# * [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
# * [github/eddwebster](https://github.com/eddwebster/); and
# * [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster).
#
# A static version of this notebook can be found [here](https://nbviewer.org/github/eddwebster/watford/blob/main/notebooks/2_data_engineering/Physical%20Data%20Engineering%20Part%202.ipynb). This notebook has an accompanying [`watford`](https://github.com/eddwebster/watford) GitHub repository and for my full repository of football analysis, see my [`football_analysis`](https://github.com/eddwebster/football_analytics) GitHub repository.
# ___
#
# ## Notebook Contents
# 1. [Notebook Dependencies](#section1)
# 2. [Notebook Brief](#section2)
# 3. [Data Sources](#section3)
# 1. [Introduction](#section3.1)
# 2. [Read in the Datasets](#section3.2)
# 3. [Initial Data Handling](#section3.3)
# 4. [Data Engineering](#section4)
# 1. [Prepare Training Data](#section4.1)
# 2. [Split Out Unified Training Data into Individual Training Drills](#section4.2)
# 3. [Engineer DataFrame to Match Tracking Data Format](#section4.3)
# 4. [Calculate Speed, Distance, and Acceleration](#section4.4)
# 5. [Create Physical Reports for Each Individual Training Session](#section4.5)
# 6. [Create Single Physical Report for the Day of Interest](#section4.6)
# 5. [Summary](#section5)
# 6. [Next Steps](#section6)
# 7. [References](#section7)
# ___
#
#
#
# ## 1. Notebook Dependencies
#
# This notebook was written using [Python 3](https://www.python.org/) and requires the following libraries:
# * [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
# * [`NumPy`](http://www.numpy.org/) for multidimensional array computing; and
# * [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation.
#
# All packages used for this notebook can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).
# ### Import Libraries and Modules
# In[1]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv
# Import Dependencies
get_ipython().run_line_magic('matplotlib', 'inline')
# Math Operations
import numpy as np
from math import pi
# Datetime
import datetime
from datetime import date
import time
# Data Preprocessing
import pandas as pd
import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path
# Reading Directories
import glob
import os
# Working with JSON
import json
from pandas import json_normalize
# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
# Machine learning
import scipy.signal as signal
# Requests and downloads
import tqdm
import requests
# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML
# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")
# Print message
print('Setup Complete')
# In[2]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
# ### Defined Filepaths
# In[3]:
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_physical = os.path.join(base_dir, 'data', 'physical')
scripts_dir = os.path.join(base_dir, 'scripts')
models_dir = os.path.join(base_dir, 'models')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
# ### Notebook Settings
# In[4]:
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.options.mode.chained_assignment = None
# ---
#
#
#
# ## 2. Notebook Brief
# This notebook parses and engineers a provided dataset of physical data using [pandas](http://pandas.pydata.org/).
#
#
# **Notebook Conventions**:
# * Variables that refer a `DataFrame` object are prefixed with `df_`.
# * Variables that refer to a collection of `DataFrame` objects (e.g., a list, a set or a dict) are prefixed with `dfs_`.
# ---
#
#
#
# ## 3. Data Sources
#
#
# ### 3.1. Introduction
# The physical data...
#
#
# ### 3.2. Import Data
# The `CSV` files provided will be read in as [pandas](https://pandas.pydata.org/) DataFrames.
# In[5]:
# Read data directory
print(glob.glob(os.path.join(data_dir_physical, 'raw', 'Set 2', '*')))
# ##### Unify Training data
# In[8]:
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def unify_training_data(date):
"""
Define a function to unify all the training data for a single data, defined in the function's parameter
of the formation 'YYYY-MM-DD'
For this example dataset, there is data for just '2022-02-02'
# KEY STEPS
# - USE GLOB TO PRODUCE SEPARATE DATAFRAMES FOR THE FOLLOWING:
## + ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY
## + CROSSING-AND-FINISHING-HSR-SPR
## + FULL-SESSION-MODIFIED
## + MATCH-MSG
## + PASSING-DRILL-PHYSICAL
## + WARM-UP-COORDINATION-AGILITY
# - THESE UNIFIED DATAFRAMES NEED TO INCLUDE NEW COLUMNS FOR DATE (FILENAME) AND PLAYER NAME (FILENAME OR COLUMM)
# - AT THIS STAGE, THE UNIFIED DATAFRAMES CAN BE EXPORTED AS ENGINEERED FILES, BUT UNIFIED
# - NEXT, DROP ALL COLUMNS EXCEPT: Player Display Name, Time, Lat, Lon, Speed (m/s)
# - DEDUPLICATE THE DATAFRAME, MANY COLUMNS REMOVED ONCE GYRO DATA IGNORED
# - USE Player Display Name TO RENAME THE COLUMNS FOR Time, Lat, Lon, Speed (m/s), TO PREFIX WITH NAME
# - THEN DROP Player Display Name
# - USE LAURIE'S METRICA SCRIPT TO CALCULATE THE SPEED, DISTANCE, AND ACCELERATION USING THE LAT/LON AND TIMESTEP
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset', f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time reading of CSV files started
print(f'Reading of CSV files started at: {tic}')
### List all files available
lst_all_files = glob.glob(os.path.join(data_dir_physical, 'raw', 'Set 2', f'{date}-*.csv'))
### Create an empty list to append individual DataFrames
lst_files_to_append =[]
### Iterate through each file in list of all files
for file in lst_all_files:
### Create temporary DataFrame with each file
df_temp = pd.read_csv(file, index_col=None, header=0)
### Create a column that contains the filename - useful for information about the date, player, and training drill
df_temp['Filename'] = os.path.basename(file)
### Append each individual Define each individual file to the empty list (to be concatenated)
lst_files_to_append.append(df_temp)
### Concatenate all the files
df_all = pd.concat(lst_files_to_append, axis=0, ignore_index=True)
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset')
#### Save the combined file as a CSV
df_all.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### Engineer the data
####
df_all['Date'] = date
####
#df_all['Training Type'] = training_type
#### Reorder Columns
#df_all = df_all[['Filename'] + [col for col in df_all.columns if col != 'Filename']]
#df_all = df_all[['Date'] + [col for col in df_all.columns if col != 'Date']]
### End timer
toc = datetime.datetime.now()
### Print time reading of CSV files end
print(f'Reading of CSV files ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken create a single DataFrame for from the individual CSV files is: {total_time/60:0.2f} minutes.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('CSV file already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_all = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset', f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return DataFrame
return df_all
# In[9]:
df_training_data_all = unify_training_data('2022-02-02')
# In[10]:
# Display DataFrame
df_training_data_all.head()
#
#
# ### 3.3. Initial Data Handling
# First check the quality of the dataset by looking first and last rows in pandas using the [`head()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [`tail()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods.
# In[11]:
# Display the first five rows of the DataFrame, df_training_data_all
df_training_data_all.head()
# In[12]:
# Display the last five rows of the DataFrame, df_training_data_all
df_training_data_all.tail()
# In[13]:
# Print the shape of the DataFrame, df_training_data_all
print(df_training_data_all.shape)
# In[14]:
# Print the column names of the DataFrame, df_training_data_all
print(df_training_data_all.columns)
# In[15]:
# Data types of the features of the DataFrame, df_training_data_all
df_training_data_all.dtypes
# Full details of these attributes and their data types is discussed further in the [Data Dictionary](section3.2.2).
# In[16]:
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df_training_data_all.dtypes)
# In[17]:
# Info for the DataFrame, df_training_data_all
df_training_data_all.info()
# The memory usage is 2.7+ GB. The saved file is 4.2 GB, quite large.
# In[18]:
# Plot visualisation of the missing values for each feature of the raw DataFrame, df_training_data_all
#msno.matrix(df_training_data_all, figsize = (30, 7))
# In[19]:
# Counts of missing values
null_value_stats = df_training_data_all.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]
# The dataset as expected, has no null values and is ready to be engineered.
# ---
#
#
#
# ## 4. Data Engineering
# The next step is to wrangle the dataset to into a format that’s suitable for analysis and also to work with existing code to determine metrics such as speeds, distance, acceleration.
#
# This section is broken down into the following subsections:
#
# 4.1. [Prepare Training Data](#section4.1)
# 4.2. [Split Out Unified Training Data into Individual Training Drills](#section4.2)
# 4.3. [Engineer DataFrame to Match Tracking Data Format](#section4.3)
# 4.4. [Calculate Speed, Distance, and Acceleration](#section4.4)
# 4.5. [Create Physical Reports for Each Individual Training Session](#section4.5)
# 4.6. [Create Single Physical Report for the Day of Interest](#section4.6)
#
#
# ### 4.1. Prepare Training Data
# In[20]:
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def prepare_training_data(df, date):
"""
Define a function to prepare the unified training dataset'
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset', f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Engineering of the unified training data CSV file started at: {tic}')
### Select columns of interest and dedupe the DataFrame
df_select = df_training_data_all[['Player Display Name', 'Time', 'Lat', 'Lon', 'Speed (m/s)', 'Filename']].drop_duplicates().reset_index(drop=True)
### Create Date column
df_select['Date'] = date
### Convert Speed (m/s) to Speed (km/h)
df_select['Speed (km/h)'] = df_select['Speed (m/s)'] * 18/5
### Use the Filename, Player Display Name and Date to determining the Training Drill
df_select['Training Drill'] = df_select['Filename']
df_select['Training Drill'] = df_select['Training Drill'].str.replace('JOAO-PEDRO', 'JOAO PEDRO') # Temporary fix for Joao Pedro bug, fix later
df_select['Training Drill'] = df_select.apply(lambda x: x['Training Drill'].replace(x['Player Display Name'], ''), axis=1)
df_select['Training Drill'] = df_select.apply(lambda x: x['Training Drill'].replace(x['Date'], ''), axis=1)
df_select['Training Drill'] = df_select['Training Drill'].str.replace('--', '').str.replace('.csv', '')
### Convert date from string type to date type
df_select['Date'] = pd.to_datetime(df_select['Date'], errors='coerce', format='%Y-%m-%d')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset')
#### Save the combined file as a CSV
df_select.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Engineering of the unified training data CSV file ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to engineer and save unified training data is: {total_time/60:0.2f} minutes.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Engineered CSV file of unified training already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_select = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset', f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return DataFrame
return df_select
# In[21]:
df_training_data_select = prepare_training_data(df_training_data_all, '2022-02-02')
# In[22]:
df_training_data_select
# In[23]:
df_training_data_select.shape
# In[24]:
df_training_data_select.info()
# In[25]:
df_training_data_select.head(10)
# In[26]:
# Print statements about the dataset
## Define variables for print statments
training_drill_types = df_training_data_select['Training Drill'].unique()
players = df_training_data_select['Player Display Name'].unique()
count_training_drill_types = len(df_training_data_select['Training Drill'].unique())
count_players = len(df_training_data_select['Player Display Name'].unique())
## Print statements
print(f'The Training DataFrame for 2022-02-02 contains the data for {count_training_drill_types:,} different training drills, including: {training_drill_types}.\n')
print(f'The Training DataFrame for 2022-02-02 contains the data for {count_players:,} different players, including: {players}.\n')
#
#
# ### 4.2. Split Out Unified Training Data into Individual Training Drills
# Split out the unified DataFrame into the individual training drills.
#
# **Note**: It's important to do this before later conversions of the format and speed/acceleration calculations because not all the training sessions take place at the same time, as then they sessions could later get mixed up.
# In[27]:
lst_training_types = list(df_training_data_select['Training Drill'].unique())
lst_training_types
# In[28]:
df_training_match_msg = df_training_data_select[df_training_data_select['Training Drill'] == 'MATCH-MSG']
df_training_crossing_and_finishing_hsr_spr = df_training_data_select[df_training_data_select['Training Drill'] == 'CROSSING-AND-FINISHING-HSR-SPR']
df_training_attack_vs_defence_attack_superiority = df_training_data_select[df_training_data_select['Training Drill'] == 'ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY']
df_training_full_session_modified = df_training_data_select[df_training_data_select['Training Drill'] == 'FULL-SESSION-MODIFIED']
df_training_passing_drill_physical = df_training_data_select[df_training_data_select['Training Drill'] == 'PASSING-DRILL-PHYSICAL']
df_training_warm_up_coordination_agility = df_training_data_select[df_training_data_select['Training Drill'] == 'WARM-UP-COORDINATION-AGILITY']
# In[29]:
df_training_match_msg.head()
# In[30]:
df_training_match_msg.shape
# In[31]:
list(df_training_match_msg['Player Display Name'].unique())
#
#
# ### 4.3. Engineer DataFrame to Match Tracking Data Format
# To work with the existing Tracking data libraries, based on [Laurie Shaw](https://twitter.com/EightyFivePoint)'s Metrica Sports Tracking data libraries, [`LaurieOnTracking`](https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking), the data needs to be engineered to match the Metrica schema, which is the following:
#
# | Feature | Data type | Definition |
# |-------------------------------------------|---------------|----------------|
# | `Frame` | int64 | |
# | `Period` | int64 | |
# | `Time [s]` | float64 | |
# | `Home/Away_No._x` (repeated 14 times) | float64 | |
# | `Home/Away_No._y` (repeated 14 times) | float64 | |
# | `ball_x` | float64 | |
# | `ball_y` | float64 | |
#
# However, as this is Training data, the `Home` and `Away` columns need to be replaced with the players names, which takes place in this code. However, to make the visualisation Tracking data scripts compatible, such as creating Pitch Control clips, that code will require some alteration to work with the player names. However, for the purposes of this exercise to calculate metrics such as the Speed, Accelerations, and Total Distances covered by the players, this alteration of the visualisation code is out of scope and is not covered.
#
# To learn more about the Metrica Sports schema, see the official documentation [[link](https://github.com/metrica-sports/sample-data/blob/master/documentation/events-definitions.pdf)].
# In[33]:
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def convert_training_data_format(df, date, training_drill):
"""
Define a function to convert the format of the training dataset to match Tracking data'
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Conversion of the format of the training data started at: {tic}')
##
df_pvt = df.copy()
##
lst_players = list(df_pvt['Player Display Name'].unique())
## Rename columns
df_pvt = df_pvt.rename(columns={'Time': 'Time [s]',
'Lon': 'x',
'Lat': 'y'
}
)
##
df_pvt = df_pvt.drop(columns=['Filename'])
## Create empty DataFrame of timestamps
df_time = df_pvt[['Time [s]', 'Date', 'Training Drill']].drop_duplicates().reset_index(drop=True)
## Create empty DataFrame of timestamps
df_time = df_time.reset_index(drop=False)
## Rename index column to 'Frame'
df_time = df_time.rename(columns={'index': 'Frame'})
##
df_pvt_final = df_time.copy()
## Iterate through each file in list of all files
for player in lst_players:
### Create temporary DataFrame with each file
df_player = df_pvt[df_pvt['Player Display Name'] == player]
###
df_player['Player'] = df_player['Player Display Name'].str.title()
###
player_title = player.title()
###
df_player = df_player.rename(columns={'Time [s]': 'Time',
'x': f'{player_title}_x',
'y': f'{player_title}_y',
'Speed (m/s)': f'{player_title} Speed (m/s)',
'Speed (km/h)': f'{player_title} Speed (km/h)'
}
)
###
df_player = df_player[['Time', f'{player_title}_x', f'{player_title}_y', f'{player_title} Speed (m/s)', f'{player_title} Speed (km/h)']]
### Join each individual DataFrame to time DataFrame
df_pvt_final = pd.merge(df_pvt_final, df_player, left_on=['Time [s]'], right_on=['Time'], how='left')
###
df_pvt_final = df_pvt_final.drop(columns=['Time'])
###
df_pvt_final = df_pvt_final.drop_duplicates()
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset')
#### Save the combined file as a CSV
df_pvt_final.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Conversion of the format of the training data ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to convert the format and save the training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Converted training data already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_pvt_final = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return the DataFrame
return(df_pvt_final)
# In[34]:
df_training_match_msg_pvt = convert_training_data_format(df=df_training_match_msg, date='2022-02-02', training_drill='MATCH-MSG')
df_training_crossing_and_finishing_hsr_spr_pvt = convert_training_data_format(df=df_training_crossing_and_finishing_hsr_spr, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR')
df_training_attack_vs_defence_attack_superiority_pvt = convert_training_data_format(df=df_training_attack_vs_defence_attack_superiority, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY')
df_training_full_session_modified_pvt = convert_training_data_format(df=df_training_full_session_modified, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED')
df_training_passing_drill_physical_pvt = convert_training_data_format(df=df_training_passing_drill_physical, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL')
df_training_warm_up_coordination_agility_pvt = convert_training_data_format(df=df_training_warm_up_coordination_agility, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY')
# In[35]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_match_msg_pvt
msno.matrix(df_training_match_msg_pvt, figsize = (30, 7))
# In[36]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_crossing_and_finishing_hsr_spr_pvt
msno.matrix(df_training_crossing_and_finishing_hsr_spr_pvt, figsize = (30, 7))
# In[37]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_attack_vs_defence_attack_superiority_pvt
msno.matrix(df_training_attack_vs_defence_attack_superiority_pvt, figsize = (30, 7))
# In[38]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_full_session_modified_pvt
msno.matrix(df_training_full_session_modified_pvt, figsize = (30, 7))
# In[39]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_passing_drill_physical_pvt
msno.matrix(df_training_passing_drill_physical_pvt, figsize = (30, 7))
# In[40]:
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_warm_up_coordination_agility_pvt
msno.matrix(df_training_warm_up_coordination_agility_pvt, figsize = (30, 7))
# From the visualisation, we can see that, that for certain drills, all the players are involved. However, for some drills the players are involved at different times
# In[41]:
df_training_attack_vs_defence_attack_superiority_pvt.head(20)
#
#
# ### 4.4. Calculate Speed, Distance, and Acceleration
# In[82]:
# Define function for calculating the velocities and accelerations of the training data using the x, y locations and timestep
def calc_player_velocities_accelerations(df, date='2022-02-02', training_drill='NOT-DEFINED', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1):
""" calc_player_velocities_accelerations( training data )
Calculate player velocities in x & y direction, and total player speed at each timestamp of the tracking data
Parameters
-----------
df: the tracking DataFrame
smoothing_v: boolean variable that determines whether velocity measures are smoothed. Default is True.
filter: type of filter to use when smoothing_v the velocities. Default is Savitzky-Golay, which fits a polynomial of order 'polyorder' to the data within each window
window: smoothing_v window size in # of frames
polyorder: order of the polynomial for the Savitzky-Golay filter. Default is 1 - a linear fit to the velcoity, so gradient is the acceleration
maxspeed: the maximum speed that a player can realisitically achieve (in meters/second). Speed measures that exceed maxspeed are tagged as outliers and set to NaN.
Returns
-----------
df : the tracking DataFrame with columns for speed in the x & y direction and total speed added
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Calculation of each player\'s speed and accelerations for the {training_drill} started at: {tic}')
# Create columns
#df['Date Time [s]'] = pd.to_datetime(df['Date'] + ' ' + df['Time [s]'])
df['Period'] = 1
# remove any velocity data already in the dataframe
df = remove_player_velocities_accelerations(df)
# Get the player ids
player_ids = [col for col in df.columns if '_x' in col]
player_ids = [s.replace('_x', '') for s in player_ids]
# Calculate the timestep from one frame to the next - not required.
#dt = df['Time [s]'].diff()
#dt = df['Date Time [s]'].diff()
# index of first frame in second half
#second_half_idx = df.Period.idxmax(2)
second_half_idx = df[df.Period == 2].first_valid_index()
# estimate velocities for players in df
for player in player_ids: # cycle through players individually
# difference player positions in timestep dt to get unsmoothed estimate of velicity
vx = df[player + '_x'].diff() / dt
vy = df[player + '_y'].diff() / dt
if maxspeed>0:
# remove unsmoothed data points that exceed the maximum speed (these are most likely position errors)
raw_speed = np.sqrt( vx**2 + vy**2 )
#acceleration = raw_speed.diff() / dt
vx[ raw_speed>maxspeed ] = np.nan
vy[ raw_speed>maxspeed ] = np.nan
#if maxacc>0:
#ax[ raw_acc>maxacc ] = np.nan
#ay[ raw_acc>maxacc ] = np.nan
if smoothing_v:
if filter_=='Savitzky-Golay':
# calculate first half velocity
vx.loc[:second_half_idx] = signal.savgol_filter(vx.loc[:second_half_idx],window_length=window,polyorder=polyorder)
vy.loc[:second_half_idx] = signal.savgol_filter(vy.loc[:second_half_idx],window_length=window,polyorder=polyorder)
# calculate second half velocity
vx.loc[second_half_idx:] = signal.savgol_filter(vx.loc[second_half_idx:],window_length=window,polyorder=polyorder)
vy.loc[second_half_idx:] = signal.savgol_filter(vy.loc[second_half_idx:],window_length=window,polyorder=polyorder)
elif filter_=='moving average':
ma_window = np.ones( window ) / window
# calculate first half velocity
vx.loc[:second_half_idx] = np.convolve( vx.loc[:second_half_idx], ma_window, mode='same')
vy.loc[:second_half_idx] = np.convolve( vy.loc[:second_half_idx], ma_window, mode='same')
# calculate second half velocity
vx.loc[second_half_idx:] = np.convolve( vx.loc[second_half_idx:], ma_window, mode='same')
vy.loc[second_half_idx:] = np.convolve( vy.loc[second_half_idx:], ma_window, mode='same')
#speed = ( vx**2 + vy**2 )**.5
#acceleration = np.diff(speed) / dt
#ax = np.convolve( ax, ma_window, mode='same' )
#ay = np.convolve( ay, ma_window, mode='same' )
# put player speed in x, y direction, and total speed back in the data frame
# put player speed in x, y direction, and total speed back in the data frame
df[player + '_vx'] = vx
df[player + '_vy'] = vy
df[player + '_speed'] = np.sqrt(vx**2 + vy**2)
#df[player + '_ax'] = ax
#df[player + '_ay'] = ay
#df[player + '_rawspeed'] = raw_speed
#df[player + '_rawacc'] = raw_acc
df[player + '_speed'] = np.sqrt(vx**2 + vy**2)
# Calculate acceleration - method 1, using speed calculated
#acceleration = df[player + '_speed'].diff() / dt
#df[player + '_acceleration'] = acceleration
# Calculate acceleration - method 2, using speed provided
acceleration = df[player + ' Speed (m/s)'].diff() / dt
#acceleration = (df[player + ' Speed (m/s)'] - df[player + ' Speed (m/s)'].shift()) / 0.1
df[player + ' Acceleration (m/s/s)'] = acceleration
if smoothing_a:
ma_window = np.ones( window ) / window
df[player + ' Acceleration (m/s/s)'] = np.convolve( acceleration, ma_window, mode='same')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset')
#### Save the combined file as a CSV
df.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Calculation of each player\'s speed and accelerations for the {training_drill} ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to calculate speed and acceleration and save the training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Training data with calculated velocities and accelerations already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return the DataFrame
return(df)
def remove_player_velocities_accelerations(df):
# remove player velocities and acceleration measures that are already in the 'df' dataframe
columns = [c for c in df.columns if c.split('_')[-1] in ['vx', 'vy', 'ax', 'ay', 'rawspeed', 'rawacc', 'speed', 'acceleration']] # Get the player ids
df = df.drop(columns=columns)
return df
# In[84]:
# Calculate the velocity and accelerations for each player in each of the six training sessions
df_training_match_msg_vel = calc_player_velocities_accelerations(df=df_training_match_msg_pvt, date='2022-02-02', training_drill='MATCH-MSG', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_crossing_and_finishing_hsr_spr_vel = calc_player_velocities_accelerations(df=df_training_crossing_and_finishing_hsr_spr_pvt, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_attack_vs_defence_attack_superiority_vel = calc_player_velocities_accelerations(df=df_training_attack_vs_defence_attack_superiority_pvt, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_full_session_modified_vel = calc_player_velocities_accelerations(df=df_training_full_session_modified_pvt, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_passing_drill_physical_vel = calc_player_velocities_accelerations(df=df_training_passing_drill_physical_pvt, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_warm_up_coordination_agility_vel = calc_player_velocities_accelerations(df=df_training_warm_up_coordination_agility_pvt, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
# In[44]:
sorted(df_training_attack_vs_defence_attack_superiority_vel.columns)
# In[64]:
# Display DataFrame - ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY
df_training_attack_vs_defence_attack_superiority_vel.head()
#
#
# ### 4.5. Create Physical Reports for Each Individual Training Session
# The speed zones are defined as:
# * Low-Speed Activities (LSA) (<14 km/h or <4 m/s);
# * Moderate-Speed Running (MSR) (14.4–19.8 km/h or 4-5.5 m/s);
# * High-Speed Running (HSR) (19.8–25.1 km/h or 5.5-6.972 m/s); and
# * Sprinting (≥25.2 km km/h or ≥6.972 m/s).
#
# For further information, see: [Application of Individualized Speed Zones to Quantify External Training Load in Professional Soccer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126260/) by Vincenzo Rago, João Brito, Pedro Figueiredo, Peter Krustrup, and António Rebelo.
# In[89]:
# Define a function to generate a bespoke physical summary of all the players for an individual training session
def create_physical_report_per_training_session(df, date='2022-02-02', training_drill='NOT-DEFINED'):
"""
Define a function to generate a bespoke physical summary of all the players for an individual training session
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Creation of the physical report for the {training_drill} training drill started at: {tic}')
## Data Engineering
###
lst_cols = list(df)
###
lst_players = []
###
for col in lst_cols:
if '_x' in col:
col = col.replace('_x', '')
lst_players.append(col)
### Create DataFrame where each row is a player
df_summary = pd.DataFrame(lst_players, columns=['Player'])
##
df_summary['Date'] = date
df_summary['Training Drill'] = training_drill
## Calculate minutes played for each player
### Create empty list for minutes
lst_minutes = []
### Cycle through each player's jersey number in the team and look for the first and last time for each player
for player in lst_players:
#### Search for first and last frames that we have a position observation for each player (when a player is not on the pitch positions are NaN)
column = f'{player}' + '_x' # use player x-position coordinate
try:
player_minutes = (df[column].last_valid_index() - df[column].first_valid_index() + 1) / 10 / 60 # convert to minutes
except:
player_minutes = 0
lst_minutes.append(player_minutes)
### Create column for the minute played
df_summary['Minutes Trained'] = lst_minutes
### Sort values by minutes played descending
df_summary = df_summary.sort_values(['Minutes Trained'], ascending=False)
## Calculate total distance covered for each player
### Create empty list for distance
lst_distance = []
### Cycle through each player and multiple their speed at any given instance by 10ms to get total distance and divide by 1,000 to get this in km
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
df_player_distance = df[column].sum()/100./1000 # speed time. Convert to km (original logic)
#df_player_distance = (df[column].sum() * 0.01) / 1000 # Distance = Speed * Time
lst_distance.append(df_player_distance)
### Create column for the distance in km
df_summary['Distance [km]'] = lst_distance
## Calculate total distance covered for each player for different types of movement
### Create empty lists for distances of different movements
lst_lsa = []
lst_msr = []
lst_hsr = []
lst_sprinting = []
### Cycle through each player's jersey number in the team and
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
### Low-Speed Activities (LSA) (<14 km/h or <4 m/s)
player_distance = df.loc[df[column] < 4, column].sum()/100./1000
#player_distance = df.loc[df[column] < 14.4, column].sum()/100./1000
lst_lsa.append(player_distance)
### Moderate-Speed Running (MSR) (14.4–19.8 km/h or 4-5.5 m/s)
player_distance = df.loc[(df[column] >= 4) & (df[column] < 5.5), column].sum()/100./1000
#player_distance = df.loc[(df[column] >= 14.4) & (df[column] < 19.8), column].sum()/100./1000
lst_msr.append(player_distance)
### High-Speed Running (HSR) (19.8–25.1 km/h or 5.5-6.972 m/s)
player_distance = df.loc[(df[column] >= 5.5) & (df[column] < 6.972222), column].sum()/100./1000
#player_distance = df.loc[(df[column] >= 19.8) & (df[column] < 25.1), column].sum()/100./1000
lst_hsr.append(player_distance)
### Sprinting (≥25.2 km km/h or ≥6.972 m/s)
player_distance = df.loc[df[column] >= 6.972222, column].sum()/100./1000
#player_distance = df.loc[df[column] >= 25.2, column].sum()/100./1000
lst_sprinting.append(player_distance)
### Assign each movement list to a column in the Summary DataFrame
df_summary['Low-Speed Activities (LSA) [km]'] = lst_lsa
df_summary['Moderate-Speed Running (MSR) [km]'] = lst_msr
df_summary['High-Speed Running (HSR) [km]'] = lst_hsr
df_summary['Sprinting [km]'] = lst_sprinting
## Reset index
df_summary = df_summary.reset_index(drop=True)
## Determine the number of sustained sprints per match
### Create an empty list for the number of sprints
nsprints = []
###
#sprint_threshold = 25.2 # minimum speed to be defined as a sprint (km/h)
sprint_threshold = 6.972222 # minimum speed to be defined as a sprint (m/s)
sprint_window = 1 * 10
###
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
# trick here is to convolve speed with a window of size 'sprint_window', and find number of occassions that sprint was sustained for at least one window length
# diff helps us to identify when the window starts
player_sprints = np.diff(1 * (np.convolve(1 * (df[column] >= sprint_threshold), np.ones(sprint_window), mode='same') >= sprint_window))
nsprints.append(np.sum(player_sprints == 1 ))
### Add column for the number of sprints
df_summary['No. Sprints'] = nsprints
## Estimate the top speed of each player
### Create empty dictionary to append maximum speeds
dict_top_speeds = {}
### Iterate through the columns of the training DataFrame for the top speeds
player_speed_columns = [i for i in df.columns if ' Speed (m/s)' in i]
### Iterate through all the rows of all the speed columns, to determine the maximum speed for each player
for player in player_speed_columns:
dict_top_speeds[player] = df[player].max()
###
df_top_speeds = pd.DataFrame.from_dict(dict_top_speeds, orient='index', columns=['Top Speed [m/s]'])
###
df_top_speeds = df_top_speeds.reset_index(drop=False)
###
df_top_speeds = df_top_speeds.rename(columns={'index': 'Player'})
###
df_top_speeds['Player'] = df_top_speeds['Player'].str.replace(' Speed (m/s)', '')
###
df_top_speeds['Player'] = df_top_speeds['Player'].str.replace(' Speed \(m/s\)', '')
### Merge Top Speeds DataFrame to Summary DataFrame
df_summary = pd.merge(df_summary, df_top_speeds, left_on=['Player'], right_on=['Player'], how='left')
## Estimate the top acceleration of each player
### Create empty dictionary to append maximum accelerations
dict_top_accelerations = {}
### Iterate through the columns of the training DataFrame for the top accelerations
player_acceleration_columns = [i for i in df.columns if ' Acceleration (m/s/s)' in i]
### Iterate through all the rows of all the acceleration columns, to determine the maximum acceleration for each player
for player in player_acceleration_columns:
dict_top_accelerations[player] = df[player].max()
###
df_top_accelerations = pd.DataFrame.from_dict(dict_top_accelerations, orient='index', columns=['Top Acceleration [m/s/s]'])
###
df_top_accelerations = df_top_accelerations.reset_index(drop=False)
###
df_top_accelerations = df_top_accelerations.rename(columns={'index': 'Player'})
###
df_top_accelerations['Player'] = df_top_accelerations['Player'].str.replace(' Acceleration (m/s/s)', '')
###
df_top_accelerations['Player'] = df_top_accelerations['Player'].str.replace(' Acceleration \(m/s/s\)', '')
### Merge Top Speeds DataFrame to Summary DataFrame
df_summary = pd.merge(df_summary, df_top_accelerations, left_on=['Player'], right_on=['Player'], how='left')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports')
#### Save the combined file as a CSV
df_summary.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Creation of the physical report for the {training_drill} training drill ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to create the physical report for the {training_drill} training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Physical report already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_summary = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
## Return DataFrame
return df_summary
# In[90]:
# Create physical reports for each player in each of the six training sessions
df_training_match_msg_physical_report = create_physical_report_per_training_session(df_training_match_msg_vel, date='2022-02-02', training_drill='MATCH-MSG')
df_training_crossing_and_finishing_hsr_spr_physical_report = create_physical_report_per_training_session(df_training_crossing_and_finishing_hsr_spr_vel, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR')
df_training_attack_vs_defence_attack_superiority_physical_report = create_physical_report_per_training_session(df_training_attack_vs_defence_attack_superiority_vel, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY')
df_training_full_session_modified_physical_report = create_physical_report_per_training_session(df_training_full_session_modified_vel, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED')
df_training_passing_drill_physical_physical_report = create_physical_report_per_training_session(df_training_passing_drill_physical_vel, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL')
df_training_warm_up_coordination_agility_physical_report = create_physical_report_per_training_session(df_training_warm_up_coordination_agility_vel, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY')
#
#
# ### 4.6. Create Single Physical Report for the Day of Interest
# In[91]:
# Define a function to generate a bespoke physical summary of all the players for an individual training session
def create_physical_report_per_day(date):
"""
Define a function to generate a bespoke physical summary of all the players for an individual training session
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Creation a single training report for {date} started at: {tic}')
### List all files available
lst_all_files = glob.glob(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}*-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
### Create an empty list to append individual DataFrames
lst_files_to_append =[]
### Iterate through each file in list of all files
for file in lst_all_files:
### Create temporary DataFrame with each file
df_temp = pd.read_csv(file, index_col=None, header=0)
### Append each individual Define each individual file to the empty list (to be concatenated)
lst_files_to_append.append(df_temp)
### Concatenate all the files
df_day_training_report = pd.concat(lst_files_to_append, axis=0, ignore_index=True)
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports')
#### Save the combined file as a CSV
df_day_training_report.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### Engineer the data
####
df_day_training_report['Date'] = date
### End timer
toc = datetime.datetime.now()
### Print time reading of CSV files end
print(f'Creation a single training report for {date} ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken create a single training report for {date} is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('CSV file already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_day_training_report = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
## Return DataFrame
return df_day_training_report
# In[92]:
df_training_report_02022022 = create_physical_report_per_day(date='2022-02-02')
# In[93]:
df_training_report_02022022.head(10)
# In[54]:
df_training_report_02022022.shape
# ---
#
#
#
# ## 5. Summary
# This notebook engineer physical data using [pandas](http://pandas.pydata.org/) to create a series of training reports for players, determining metrics include distance covered, total sprints, top speeds, amoungst other breakdowns.
# ---
#
#
#
# ## 6. Next Steps
# The next stage is to visualise this data in Tableau and analyse the findings, to be presented in a deck.
# ---
#
#
#
# ## 7. References
# * [Application of Individualized Speed Zones to Quantify External Training Load in Professional Soccer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126260/) by Vincenzo Rago, João Brito, Pedro Figueiredo, Peter Krustrup, and António Rebelo.
# * [Laurie Shaw](https://twitter.com/EightyFivePoint)'s Metrica Sports Tracking data libraries, [`LaurieOnTracking`](https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking)
# ---
#
# ***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***
# [Back to the top](#top)