Notebook first written: 11/02/2022
Notebook last updated: 12/02/2022
Click here to jump straight into the Data Engineering section and skip the Notebook Brief and Data Sources sections.
This notebook engineers a second of two provided datasets of physical data by Watford F.C, using pandas for data manipulation through DataFrames.
For more information about this notebook and the author, I am available through all the following channels:
A static version of this notebook can be found here. This notebook has an accompanying watford
GitHub repository and for my full repository of football analysis, see my football_analysis
GitHub repository.
This notebook was written using Python 3 and requires the following libraries:
Jupyter notebooks
for this notebook environment with which this project is presented;NumPy
for multidimensional array computing; andpandas
for data analysis and manipulation.All packages used for this notebook can be obtained by downloading and installing the Conda distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv
# Import Dependencies
%matplotlib inline
# Math Operations
import numpy as np
from math import pi
# Datetime
import datetime
from datetime import date
import time
# Data Preprocessing
import pandas as pd
import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path
# Reading Directories
import glob
import os
# Working with JSON
import json
from pandas import json_normalize
# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
# Machine learning
import scipy.signal as signal
# Requests and downloads
import tqdm
import requests
# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML
# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")
# Print message
print('Setup Complete')
Setup Complete
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
Python: 3.7.6 NumPy: 1.19.1 pandas: 1.1.3 matplotlib: 3.3.1
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_physical = os.path.join(base_dir, 'data', 'physical')
scripts_dir = os.path.join(base_dir, 'scripts')
models_dir = os.path.join(base_dir, 'models')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.options.mode.chained_assignment = None
This notebook parses and engineers a provided dataset of physical data using pandas.
Notebook Conventions:
DataFrame
object are prefixed with df_
.DataFrame
objects (e.g., a list, a set or a dict) are prefixed with dfs_
.# Read data directory
print(glob.glob(os.path.join(data_dir_physical, 'raw', 'Set 2', '*')))
[]
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def unify_training_data(date):
"""
Define a function to unify all the training data for a single data, defined in the function's parameter
of the formation 'YYYY-MM-DD'
For this example dataset, there is data for just '2022-02-02'
# KEY STEPS
# - USE GLOB TO PRODUCE SEPARATE DATAFRAMES FOR THE FOLLOWING:
## + ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY
## + CROSSING-AND-FINISHING-HSR-SPR
## + FULL-SESSION-MODIFIED
## + MATCH-MSG
## + PASSING-DRILL-PHYSICAL
## + WARM-UP-COORDINATION-AGILITY
# - THESE UNIFIED DATAFRAMES NEED TO INCLUDE NEW COLUMNS FOR DATE (FILENAME) AND PLAYER NAME (FILENAME OR COLUMM)
# - AT THIS STAGE, THE UNIFIED DATAFRAMES CAN BE EXPORTED AS ENGINEERED FILES, BUT UNIFIED
# - NEXT, DROP ALL COLUMNS EXCEPT: Player Display Name, Time, Lat, Lon, Speed (m/s)
# - DEDUPLICATE THE DATAFRAME, MANY COLUMNS REMOVED ONCE GYRO DATA IGNORED
# - USE Player Display Name TO RENAME THE COLUMNS FOR Time, Lat, Lon, Speed (m/s), TO PREFIX WITH NAME
# - THEN DROP Player Display Name
# - USE LAURIE'S METRICA SCRIPT TO CALCULATE THE SPEED, DISTANCE, AND ACCELERATION USING THE LAT/LON AND TIMESTEP
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset', f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time reading of CSV files started
print(f'Reading of CSV files started at: {tic}')
### List all files available
lst_all_files = glob.glob(os.path.join(data_dir_physical, 'raw', 'Set 2', f'{date}-*.csv'))
### Create an empty list to append individual DataFrames
lst_files_to_append =[]
### Iterate through each file in list of all files
for file in lst_all_files:
### Create temporary DataFrame with each file
df_temp = pd.read_csv(file, index_col=None, header=0)
### Create a column that contains the filename - useful for information about the date, player, and training drill
df_temp['Filename'] = os.path.basename(file)
### Append each individual Define each individual file to the empty list (to be concatenated)
lst_files_to_append.append(df_temp)
### Concatenate all the files
df_all = pd.concat(lst_files_to_append, axis=0, ignore_index=True)
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset')
#### Save the combined file as a CSV
df_all.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### Engineer the data
####
df_all['Date'] = date
####
#df_all['Training Type'] = training_type
#### Reorder Columns
#df_all = df_all[['Filename'] + [col for col in df_all.columns if col != 'Filename']]
#df_all = df_all[['Date'] + [col for col in df_all.columns if col != 'Date']]
### End timer
toc = datetime.datetime.now()
### Print time reading of CSV files end
print(f'Reading of CSV files ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken create a single DataFrame for from the individual CSV files is: {total_time/60:0.2f} minutes.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('CSV file already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_all = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '1_unified_training_dataset', f'{date}-ALL-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return DataFrame
return df_all
df_training_data_all = unify_training_data('2022-02-02')
CSV file already saved to local storage. Reading in file as a pandas DataFrame.
# Display DataFrame
df_training_data_all.head()
Player Display Name | Time | Lat | Lon | Speed (m/s) | Heart Rate (bpm) | Hacc | Hdop | Quality of Signal | No. of Satellites | Instantaneous Acceleration Impulse | Accl X | Accl Y | Accl Z | Gyro Yro X | Gyro Y | Gyro Z | Filename | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.133956 | 0.936960 | 0.799344 | 14.00 | -24.08 | -6.65 | 2022-02-02-MASINA-MATCH-MSG.csv |
1 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.197640 | 0.898164 | 0.730536 | 10.71 | -31.92 | -4.76 | 2022-02-02-MASINA-MATCH-MSG.csv |
2 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.246684 | 0.903288 | 0.657336 | 3.85 | -38.08 | 0.07 | 2022-02-02-MASINA-MATCH-MSG.csv |
3 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.247416 | 0.892308 | 0.632448 | -2.10 | -40.67 | 7.07 | 2022-02-02-MASINA-MATCH-MSG.csv |
4 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.193248 | 0.882792 | 0.663924 | -4.76 | -39.69 | 13.51 | 2022-02-02-MASINA-MATCH-MSG.csv |
First check the quality of the dataset by looking first and last rows in pandas using the head()
and tail()
methods.
# Display the first five rows of the DataFrame, df_training_data_all
df_training_data_all.head()
Player Display Name | Time | Lat | Lon | Speed (m/s) | Heart Rate (bpm) | Hacc | Hdop | Quality of Signal | No. of Satellites | Instantaneous Acceleration Impulse | Accl X | Accl Y | Accl Z | Gyro Yro X | Gyro Y | Gyro Z | Filename | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.133956 | 0.936960 | 0.799344 | 14.00 | -24.08 | -6.65 | 2022-02-02-MASINA-MATCH-MSG.csv |
1 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.197640 | 0.898164 | 0.730536 | 10.71 | -31.92 | -4.76 | 2022-02-02-MASINA-MATCH-MSG.csv |
2 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.246684 | 0.903288 | 0.657336 | 3.85 | -38.08 | 0.07 | 2022-02-02-MASINA-MATCH-MSG.csv |
3 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.247416 | 0.892308 | 0.632448 | -2.10 | -40.67 | 7.07 | 2022-02-02-MASINA-MATCH-MSG.csv |
4 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 0 | 0 | 0.4 | 298 | 21 | 1.083334 | 0.193248 | 0.882792 | 0.663924 | -4.76 | -39.69 | 13.51 | 2022-02-02-MASINA-MATCH-MSG.csv |
# Display the last five rows of the DataFrame, df_training_data_all
df_training_data_all.tail()
Player Display Name | Time | Lat | Lon | Speed (m/s) | Heart Rate (bpm) | Hacc | Hdop | Quality of Signal | No. of Satellites | Instantaneous Acceleration Impulse | Accl X | Accl Y | Accl Z | Gyro Yro X | Gyro Y | Gyro Z | Filename | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20164525 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.0 | 0 | 0 | 0.4 | 335 | 21 | 0.0 | 0.098088 | 0.988932 | 0.433344 | -4.69 | 0.84 | 7.21 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv |
20164526 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.0 | 0 | 0 | 0.4 | 335 | 21 | 0.0 | 0.103212 | 0.984540 | 0.449448 | 0.70 | 1.33 | 6.79 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv |
20164527 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.0 | 0 | 0 | 0.4 | 335 | 21 | 0.0 | 0.112728 | 0.969900 | 0.461160 | 3.92 | 1.12 | 6.37 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv |
20164528 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.0 | 0 | 0 | 0.4 | 335 | 21 | 0.0 | 0.122976 | 0.956724 | 0.461160 | 4.20 | 0.00 | 6.44 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv |
20164529 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.0 | 0 | 0 | 0.4 | 335 | 21 | 0.0 | 0.128832 | 0.950868 | 0.454572 | 1.96 | -1.26 | 6.16 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv |
# Print the shape of the DataFrame, df_training_data_all
print(df_training_data_all.shape)
(20164530, 18)
# Print the column names of the DataFrame, df_training_data_all
print(df_training_data_all.columns)
Index(['Player Display Name', 'Time', 'Lat', 'Lon', 'Speed (m/s)', 'Heart Rate (bpm)', 'Hacc', 'Hdop', 'Quality of Signal', 'No. of Satellites', 'Instantaneous Acceleration Impulse', 'Accl X', 'Accl Y', 'Accl Z', 'Gyro Yro X', 'Gyro Y', 'Gyro Z', 'Filename'], dtype='object')
# Data types of the features of the DataFrame, df_training_data_all
df_training_data_all.dtypes
Player Display Name object Time object Lat float64 Lon float64 Speed (m/s) float64 Heart Rate (bpm) int64 Hacc int64 Hdop float64 Quality of Signal int64 No. of Satellites int64 Instantaneous Acceleration Impulse float64 Accl X float64 Accl Y float64 Accl Z float64 Gyro Yro X float64 Gyro Y float64 Gyro Z float64 Filename object dtype: object
Full details of these attributes and their data types is discussed further in the Data Dictionary.
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df_training_data_all.dtypes)
Player Display Name object Time object Lat float64 Lon float64 Speed (m/s) float64 Heart Rate (bpm) int64 Hacc int64 Hdop float64 Quality of Signal int64 No. of Satellites int64 Instantaneous Acceleration Impulse float64 Accl X float64 Accl Y float64 Accl Z float64 Gyro Yro X float64 Gyro Y float64 Gyro Z float64 Filename object dtype: object
# Info for the DataFrame, df_training_data_all
df_training_data_all.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 20164530 entries, 0 to 20164529 Data columns (total 18 columns): # Column Dtype --- ------ ----- 0 Player Display Name object 1 Time object 2 Lat float64 3 Lon float64 4 Speed (m/s) float64 5 Heart Rate (bpm) int64 6 Hacc int64 7 Hdop float64 8 Quality of Signal int64 9 No. of Satellites int64 10 Instantaneous Acceleration Impulse float64 11 Accl X float64 12 Accl Y float64 13 Accl Z float64 14 Gyro Yro X float64 15 Gyro Y float64 16 Gyro Z float64 17 Filename object dtypes: float64(11), int64(4), object(3) memory usage: 2.7+ GB
The memory usage is 2.7+ GB. The saved file is 4.2 GB, quite large.
# Plot visualisation of the missing values for each feature of the raw DataFrame, df_training_data_all
#msno.matrix(df_training_data_all, figsize = (30, 7))
# Counts of missing values
null_value_stats = df_training_data_all.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]
Series([], dtype: int64)
The dataset as expected, has no null values and is ready to be engineered.
The next step is to wrangle the dataset to into a format that’s suitable for analysis and also to work with existing code to determine metrics such as speeds, distance, acceleration.
This section is broken down into the following subsections:
4.1. Prepare Training Data
4.2. Split Out Unified Training Data into Individual Training Drills
4.3. Engineer DataFrame to Match Tracking Data Format
4.4. Calculate Speed, Distance, and Acceleration
4.5. Create Physical Reports for Each Individual Training Session
4.6. Create Single Physical Report for the Day of Interest
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def prepare_training_data(df, date):
"""
Define a function to prepare the unified training dataset'
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset', f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Engineering of the unified training data CSV file started at: {tic}')
### Select columns of interest and dedupe the DataFrame
df_select = df_training_data_all[['Player Display Name', 'Time', 'Lat', 'Lon', 'Speed (m/s)', 'Filename']].drop_duplicates().reset_index(drop=True)
### Create Date column
df_select['Date'] = date
### Convert Speed (m/s) to Speed (km/h)
df_select['Speed (km/h)'] = df_select['Speed (m/s)'] * 18/5
### Use the Filename, Player Display Name and Date to determining the Training Drill
df_select['Training Drill'] = df_select['Filename']
df_select['Training Drill'] = df_select['Training Drill'].str.replace('JOAO-PEDRO', 'JOAO PEDRO') # Temporary fix for Joao Pedro bug, fix later
df_select['Training Drill'] = df_select.apply(lambda x: x['Training Drill'].replace(x['Player Display Name'], ''), axis=1)
df_select['Training Drill'] = df_select.apply(lambda x: x['Training Drill'].replace(x['Date'], ''), axis=1)
df_select['Training Drill'] = df_select['Training Drill'].str.replace('--', '').str.replace('.csv', '')
### Convert date from string type to date type
df_select['Date'] = pd.to_datetime(df_select['Date'], errors='coerce', format='%Y-%m-%d')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset')
#### Save the combined file as a CSV
df_select.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Engineering of the unified training data CSV file ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to engineer and save unified training data is: {total_time/60:0.2f} minutes.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Engineered CSV file of unified training already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_select = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '2_prepared_training_dataset', f'{date}-ALL-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return DataFrame
return df_select
df_training_data_select = prepare_training_data(df_training_data_all, '2022-02-02')
Engineered CSV file of unified training already saved to local storage. Reading in file as a pandas DataFrame.
df_training_data_select
Player Display Name | Time | Lat | Lon | Speed (m/s) | Filename | Date | Speed (km/h) | Training Drill | |
---|---|---|---|---|---|---|---|---|---|
0 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.330003 | MATCH-MSG |
1 | MASINA | 12:57:43.3 | 51.711185 | -0.281582 | 1.258334 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.530004 | MATCH-MSG |
2 | MASINA | 12:57:43.4 | 51.711184 | -0.281582 | 1.305557 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.700004 | MATCH-MSG |
3 | MASINA | 12:57:43.5 | 51.711183 | -0.281581 | 1.119445 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.030003 | MATCH-MSG |
4 | MASINA | 12:57:43.6 | 51.711182 | -0.281580 | 1.150001 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.140003 | MATCH-MSG |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2016448 | KALU | 13:12:40.4 | 51.711182 | -0.281380 | 0.000000 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv | 2022-02-02 | 0.000000 | FULL-SESSION-MODIFIED |
2016449 | KALU | 13:12:40.5 | 51.711182 | -0.281380 | 0.000000 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv | 2022-02-02 | 0.000000 | FULL-SESSION-MODIFIED |
2016450 | KALU | 13:12:40.6 | 51.711182 | -0.281379 | 0.000000 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv | 2022-02-02 | 0.000000 | FULL-SESSION-MODIFIED |
2016451 | KALU | 13:12:40.7 | 51.711182 | -0.281379 | 0.000000 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv | 2022-02-02 | 0.000000 | FULL-SESSION-MODIFIED |
2016452 | KALU | 13:12:40.8 | 51.711182 | -0.281379 | 0.000000 | 2022-02-02-KALU-FULL-SESSION-MODIFIED.csv | 2022-02-02 | 0.000000 | FULL-SESSION-MODIFIED |
2016453 rows × 9 columns
df_training_data_select.shape
(2016453, 9)
df_training_data_select.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2016453 entries, 0 to 2016452 Data columns (total 9 columns): # Column Dtype --- ------ ----- 0 Player Display Name object 1 Time object 2 Lat float64 3 Lon float64 4 Speed (m/s) float64 5 Filename object 6 Date object 7 Speed (km/h) float64 8 Training Drill object dtypes: float64(4), object(5) memory usage: 138.5+ MB
df_training_data_select.head(10)
Player Display Name | Time | Lat | Lon | Speed (m/s) | Filename | Date | Speed (km/h) | Training Drill | |
---|---|---|---|---|---|---|---|---|---|
0 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.330003 | MATCH-MSG |
1 | MASINA | 12:57:43.3 | 51.711185 | -0.281582 | 1.258334 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.530004 | MATCH-MSG |
2 | MASINA | 12:57:43.4 | 51.711184 | -0.281582 | 1.305557 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.700004 | MATCH-MSG |
3 | MASINA | 12:57:43.5 | 51.711183 | -0.281581 | 1.119445 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.030003 | MATCH-MSG |
4 | MASINA | 12:57:43.6 | 51.711182 | -0.281580 | 1.150001 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.140003 | MATCH-MSG |
5 | MASINA | 12:57:43.7 | 51.711181 | -0.281579 | 1.119445 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.030003 | MATCH-MSG |
6 | MASINA | 12:57:43.8 | 51.711180 | -0.281578 | 1.008334 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 3.630003 | MATCH-MSG |
7 | MASINA | 12:57:43.9 | 51.711180 | -0.281577 | 1.130556 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.070003 | MATCH-MSG |
8 | MASINA | 12:57:44.0 | 51.711179 | -0.281577 | 1.013890 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 3.650003 | MATCH-MSG |
9 | MASINA | 12:57:44.1 | 51.711178 | -0.281576 | 0.947223 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 3.410003 | MATCH-MSG |
# Print statements about the dataset
## Define variables for print statments
training_drill_types = df_training_data_select['Training Drill'].unique()
players = df_training_data_select['Player Display Name'].unique()
count_training_drill_types = len(df_training_data_select['Training Drill'].unique())
count_players = len(df_training_data_select['Player Display Name'].unique())
## Print statements
print(f'The Training DataFrame for 2022-02-02 contains the data for {count_training_drill_types:,} different training drills, including: {training_drill_types}.\n')
print(f'The Training DataFrame for 2022-02-02 contains the data for {count_players:,} different players, including: {players}.\n')
The Training DataFrame for 2022-02-02 contains the data for 6 different training drills, including: ['MATCH-MSG' 'CROSSING-AND-FINISHING-HSR-SPR' 'ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY' 'FULL-SESSION-MODIFIED' 'PASSING-DRILL-PHYSICAL' 'WARM-UP-COORDINATION-AGILITY']. The Training DataFrame for 2022-02-02 contains the data for 23 different players, including: ['MASINA' 'NGAKIA' 'CATHCART' 'KAMARA' 'KING' 'EKONG' 'ETEBO' 'KUCKA' 'DENNIS' 'GOSLING' 'KAYEMBE' 'BAAH' 'SISSOKO' 'SEMA' 'SIERRALTA' 'SAMIR' 'KABASELE' 'LOUZA' 'FLETCHER' 'FEMENIA' 'JOAO PEDRO' 'KALU' 'CLEVERLEY'].
Split out the unified DataFrame into the individual training drills.
Note: It's important to do this before later conversions of the format and speed/acceleration calculations because not all the training sessions take place at the same time, as then they sessions could later get mixed up.
lst_training_types = list(df_training_data_select['Training Drill'].unique())
lst_training_types
['MATCH-MSG', 'CROSSING-AND-FINISHING-HSR-SPR', 'ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY', 'FULL-SESSION-MODIFIED', 'PASSING-DRILL-PHYSICAL', 'WARM-UP-COORDINATION-AGILITY']
df_training_match_msg = df_training_data_select[df_training_data_select['Training Drill'] == 'MATCH-MSG']
df_training_crossing_and_finishing_hsr_spr = df_training_data_select[df_training_data_select['Training Drill'] == 'CROSSING-AND-FINISHING-HSR-SPR']
df_training_attack_vs_defence_attack_superiority = df_training_data_select[df_training_data_select['Training Drill'] == 'ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY']
df_training_full_session_modified = df_training_data_select[df_training_data_select['Training Drill'] == 'FULL-SESSION-MODIFIED']
df_training_passing_drill_physical = df_training_data_select[df_training_data_select['Training Drill'] == 'PASSING-DRILL-PHYSICAL']
df_training_warm_up_coordination_agility = df_training_data_select[df_training_data_select['Training Drill'] == 'WARM-UP-COORDINATION-AGILITY']
df_training_match_msg.head()
Player Display Name | Time | Lat | Lon | Speed (m/s) | Filename | Date | Speed (km/h) | Training Drill | |
---|---|---|---|---|---|---|---|---|---|
0 | MASINA | 12:57:43.2 | 51.711186 | -0.281583 | 1.202779 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.330003 | MATCH-MSG |
1 | MASINA | 12:57:43.3 | 51.711185 | -0.281582 | 1.258334 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.530004 | MATCH-MSG |
2 | MASINA | 12:57:43.4 | 51.711184 | -0.281582 | 1.305557 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.700004 | MATCH-MSG |
3 | MASINA | 12:57:43.5 | 51.711183 | -0.281581 | 1.119445 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.030003 | MATCH-MSG |
4 | MASINA | 12:57:43.6 | 51.711182 | -0.281580 | 1.150001 | 2022-02-02-MASINA-MATCH-MSG.csv | 2022-02-02 | 4.140003 | MATCH-MSG |
df_training_match_msg.shape
(146308, 9)
list(df_training_match_msg['Player Display Name'].unique())
['MASINA', 'KING', 'DENNIS', 'GOSLING', 'BAAH', 'NGAKIA', 'KUCKA', 'LOUZA', 'KABASELE', 'KAYEMBE', 'KALU', 'JOAO PEDRO', 'ETEBO', 'EKONG', 'FLETCHER', 'SISSOKO', 'SIERRALTA', 'CATHCART', 'SEMA', 'KAMARA', 'FEMENIA', 'SAMIR']
To work with the existing Tracking data libraries, based on Laurie Shaw's Metrica Sports Tracking data libraries, LaurieOnTracking
, the data needs to be engineered to match the Metrica schema, which is the following:
Feature | Data type | Definition | |
---|---|---|---|
Frame |
int64 | ||
Period |
int64 | ||
Time [s] |
float64 | ||
Home/Away_No._x (repeated 14 times) |
float64 | ||
Home/Away_No._y (repeated 14 times) |
float64 | ||
ball_x |
float64 | ||
ball_y |
float64 |
However, as this is Training data, the Home
and Away
columns need to be replaced with the players names, which takes place in this code. However, to make the visualisation Tracking data scripts compatible, such as creating Pitch Control clips, that code will require some alteration to work with the player names. However, for the purposes of this exercise to calculate metrics such as the Speed, Accelerations, and Total Distances covered by the players, this alteration of the visualisation code is out of scope and is not covered.
To learn more about the Metrica Sports schema, see the official documentation [link].
# Define function for unifying all the training data for a an indicated date into unified DataFrames
def convert_training_data_format(df, date, training_drill):
"""
Define a function to convert the format of the training dataset to match Tracking data'
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Conversion of the format of the training data started at: {tic}')
##
df_pvt = df.copy()
##
lst_players = list(df_pvt['Player Display Name'].unique())
## Rename columns
df_pvt = df_pvt.rename(columns={'Time': 'Time [s]',
'Lon': 'x',
'Lat': 'y'
}
)
##
df_pvt = df_pvt.drop(columns=['Filename'])
## Create empty DataFrame of timestamps
df_time = df_pvt[['Time [s]', 'Date', 'Training Drill']].drop_duplicates().reset_index(drop=True)
## Create empty DataFrame of timestamps
df_time = df_time.reset_index(drop=False)
## Rename index column to 'Frame'
df_time = df_time.rename(columns={'index': 'Frame'})
##
df_pvt_final = df_time.copy()
## Iterate through each file in list of all files
for player in lst_players:
### Create temporary DataFrame with each file
df_player = df_pvt[df_pvt['Player Display Name'] == player]
###
df_player['Player'] = df_player['Player Display Name'].str.title()
###
player_title = player.title()
###
df_player = df_player.rename(columns={'Time [s]': 'Time',
'x': f'{player_title}_x',
'y': f'{player_title}_y',
'Speed (m/s)': f'{player_title} Speed (m/s)',
'Speed (km/h)': f'{player_title} Speed (km/h)'
}
)
###
df_player = df_player[['Time', f'{player_title}_x', f'{player_title}_y', f'{player_title} Speed (m/s)', f'{player_title} Speed (km/h)']]
### Join each individual DataFrame to time DataFrame
df_pvt_final = pd.merge(df_pvt_final, df_player, left_on=['Time [s]'], right_on=['Time'], how='left')
###
df_pvt_final = df_pvt_final.drop(columns=['Time'])
###
df_pvt_final = df_pvt_final.drop_duplicates()
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset')
#### Save the combined file as a CSV
df_pvt_final.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Conversion of the format of the training data ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to convert the format and save the training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Converted training data already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_pvt_final = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '3_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return the DataFrame
return(df_pvt_final)
df_training_match_msg_pvt = convert_training_data_format(df=df_training_match_msg, date='2022-02-02', training_drill='MATCH-MSG')
df_training_crossing_and_finishing_hsr_spr_pvt = convert_training_data_format(df=df_training_crossing_and_finishing_hsr_spr, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR')
df_training_attack_vs_defence_attack_superiority_pvt = convert_training_data_format(df=df_training_attack_vs_defence_attack_superiority, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY')
df_training_full_session_modified_pvt = convert_training_data_format(df=df_training_full_session_modified, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED')
df_training_passing_drill_physical_pvt = convert_training_data_format(df=df_training_passing_drill_physical, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL')
df_training_warm_up_coordination_agility_pvt = convert_training_data_format(df=df_training_warm_up_coordination_agility, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY')
Converted training data already saved to local storage. Reading in file as a pandas DataFrame. Converted training data already saved to local storage. Reading in file as a pandas DataFrame. Converted training data already saved to local storage. Reading in file as a pandas DataFrame. Converted training data already saved to local storage. Reading in file as a pandas DataFrame. Converted training data already saved to local storage. Reading in file as a pandas DataFrame. Converted training data already saved to local storage. Reading in file as a pandas DataFrame.
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_match_msg_pvt
msno.matrix(df_training_match_msg_pvt, figsize = (30, 7))
<AxesSubplot:>
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_crossing_and_finishing_hsr_spr_pvt
msno.matrix(df_training_crossing_and_finishing_hsr_spr_pvt, figsize = (30, 7))
<AxesSubplot:>
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_attack_vs_defence_attack_superiority_pvt
msno.matrix(df_training_attack_vs_defence_attack_superiority_pvt, figsize = (30, 7))
<AxesSubplot:>
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_full_session_modified_pvt
msno.matrix(df_training_full_session_modified_pvt, figsize = (30, 7))
<AxesSubplot:>
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_passing_drill_physical_pvt
msno.matrix(df_training_passing_drill_physical_pvt, figsize = (30, 7))
<AxesSubplot:>
# Plot visualisation of the missing values for each feature of the DataFrame, df_training_warm_up_coordination_agility_pvt
msno.matrix(df_training_warm_up_coordination_agility_pvt, figsize = (30, 7))
<AxesSubplot:>
From the visualisation, we can see that, that for certain drills, all the players are involved. However, for some drills the players are involved at different times
df_training_attack_vs_defence_attack_superiority_pvt.head(20)
Frame | Time [s] | Date | Training Drill | Ngakia_x | Ngakia_y | Ngakia Speed (m/s) | Ngakia Speed (km/h) | Cathcart_x | Cathcart_y | Cathcart Speed (m/s) | Cathcart Speed (km/h) | Etebo_x | Etebo_y | Etebo Speed (m/s) | Etebo Speed (km/h) | Dennis_x | Dennis_y | Dennis Speed (m/s) | Dennis Speed (km/h) | Kayembe_x | Kayembe_y | Kayembe Speed (m/s) | Kayembe Speed (km/h) | Baah_x | Baah_y | Baah Speed (m/s) | Baah Speed (km/h) | Sierralta_x | Sierralta_y | Sierralta Speed (m/s) | Sierralta Speed (km/h) | Ekong_x | Ekong_y | Ekong Speed (m/s) | Ekong Speed (km/h) | King_x | King_y | King Speed (m/s) | King Speed (km/h) | Kucka_x | Kucka_y | Kucka Speed (m/s) | Kucka Speed (km/h) | Kamara_x | Kamara_y | Kamara Speed (m/s) | Kamara Speed (km/h) | Fletcher_x | Fletcher_y | Fletcher Speed (m/s) | Fletcher Speed (km/h) | Louza_x | Louza_y | Louza Speed (m/s) | Louza Speed (km/h) | Gosling_x | Gosling_y | Gosling Speed (m/s) | Gosling Speed (km/h) | Femenia_x | Femenia_y | Femenia Speed (m/s) | Femenia Speed (km/h) | Samir_x | Samir_y | Samir Speed (m/s) | Samir Speed (km/h) | Joao Pedro_x | Joao Pedro_y | Joao Pedro Speed (m/s) | Joao Pedro Speed (km/h) | Sissoko_x | Sissoko_y | Sissoko Speed (m/s) | Sissoko Speed (km/h) | Kabasele_x | Kabasele_y | Kabasele Speed (m/s) | Kabasele Speed (km/h) | Masina_x | Masina_y | Masina Speed (m/s) | Masina Speed (km/h) | Kalu_x | Kalu_y | Kalu Speed (m/s) | Kalu Speed (km/h) | Sema_x | Sema_y | Sema Speed (m/s) | Sema Speed (km/h) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 12:08:24.2 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710896 | 1.055556 | 3.800003 | -0.281581 | 51.711263 | 2.361113 | 8.500007 | -0.281273 | 51.710967 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281370 | 51.711195 | 1.927779 | 6.940006 | -0.281566 | 51.711257 | 2.025002 | 7.290006 | -0.281005 | 51.711463 | 0.558334 | 2.010002 | -0.281284 | 51.711088 | 1.305557 | 4.700004 | -0.281582 | 51.711126 | 1.575001 | 5.670005 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281478 | 51.711305 | 1.105556 | 3.980003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281566 | 51.711311 | 1.950002 | 7.020006 | -0.281447 | 51.711196 | 1.438890 | 5.180004 | -0.281382 | 51.711224 | 0.741667 | 2.670002 | -0.281316 | 51.711136 | 0.000000 | 0.000000 | -0.281410 | 51.711047 | 1.594446 | 5.740005 | -0.280945 | 51.711247 | 0.0 | 0.0 |
1 | 1 | 12:08:24.3 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710895 | 1.022223 | 3.680003 | -0.281580 | 51.711263 | 1.075001 | 3.870003 | -0.281273 | 51.710967 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281372 | 51.711195 | 1.863890 | 6.710005 | -0.281564 | 51.711257 | 1.175001 | 4.230003 | -0.281006 | 51.711462 | 0.852778 | 3.070002 | -0.281285 | 51.711089 | 1.261112 | 4.540004 | -0.281581 | 51.711126 | 1.377779 | 4.960004 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281477 | 51.711306 | 0.961112 | 3.460003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281564 | 51.711310 | 2.183335 | 7.860006 | -0.281448 | 51.711197 | 1.466668 | 5.280004 | -0.281381 | 51.711224 | 0.633334 | 2.280002 | -0.281316 | 51.711136 | 0.000000 | 0.000000 | -0.281410 | 51.711048 | 1.577779 | 5.680005 | -0.280945 | 51.711247 | 0.0 | 0.0 |
2 | 2 | 12:08:24.4 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710894 | 0.950001 | 3.420003 | -0.281579 | 51.711263 | 0.733334 | 2.640002 | -0.281273 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281374 | 51.711196 | 1.863890 | 6.710005 | -0.281564 | 51.711258 | 0.841667 | 3.030002 | -0.281008 | 51.711462 | 1.019445 | 3.670003 | -0.281287 | 51.711089 | 1.352779 | 4.870004 | -0.281579 | 51.711126 | 1.047223 | 3.770003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281476 | 51.711307 | 1.566668 | 5.640005 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281560 | 51.711309 | 2.572224 | 9.260007 | -0.281449 | 51.711198 | 0.975001 | 3.510003 | -0.281379 | 51.711225 | 1.480557 | 5.330004 | -0.281316 | 51.711136 | 0.000000 | 0.000000 | -0.281410 | 51.711050 | 1.472223 | 5.300004 | -0.280945 | 51.711247 | 0.0 | 0.0 |
3 | 3 | 12:08:24.5 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710893 | 0.975001 | 3.510003 | -0.281578 | 51.711262 | 0.847223 | 3.050002 | -0.281273 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281376 | 51.711197 | 1.491668 | 5.370004 | -0.281562 | 51.711257 | 0.786112 | 2.830002 | -0.281010 | 51.711462 | 1.294445 | 4.660004 | -0.281288 | 51.711090 | 1.433334 | 5.160004 | -0.281578 | 51.711126 | 0.891667 | 3.210003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281474 | 51.711308 | 1.252779 | 4.510004 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281555 | 51.711308 | 3.047225 | 10.970009 | -0.281449 | 51.711199 | 0.950001 | 3.420003 | -0.281378 | 51.711226 | 1.597224 | 5.750005 | -0.281316 | 51.711136 | 0.000000 | 0.000000 | -0.281410 | 51.711052 | 2.119446 | 7.630006 | -0.280945 | 51.711247 | 0.0 | 0.0 |
4 | 4 | 12:08:24.6 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710892 | 0.913890 | 3.290003 | -0.281577 | 51.711261 | 0.872223 | 3.140003 | -0.281274 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281377 | 51.711199 | 1.772224 | 6.380005 | -0.281562 | 51.711258 | 0.675001 | 2.430002 | -0.281011 | 51.711462 | 1.016667 | 3.660003 | -0.281290 | 51.711091 | 1.483335 | 5.340004 | -0.281577 | 51.711126 | 0.680556 | 2.450002 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711308 | 0.977779 | 3.520003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281551 | 51.711308 | 3.233336 | 11.640009 | -0.281449 | 51.711200 | 0.919445 | 3.310003 | -0.281377 | 51.711227 | 1.405557 | 5.060004 | -0.281315 | 51.711136 | 0.000000 | 0.000000 | -0.281410 | 51.711054 | 2.091668 | 7.530006 | -0.280945 | 51.711247 | 0.0 | 0.0 |
5 | 5 | 12:08:24.7 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710892 | 0.913890 | 3.290003 | -0.281576 | 51.711260 | 1.300001 | 4.680004 | -0.281274 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281378 | 51.711201 | 2.011113 | 7.240006 | -0.281560 | 51.711258 | 0.836112 | 3.010002 | -0.281012 | 51.711462 | 0.608334 | 2.190002 | -0.281292 | 51.711092 | 1.477779 | 5.320004 | -0.281576 | 51.711126 | 0.927779 | 3.340003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711309 | 1.011112 | 3.640003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281545 | 51.711308 | 3.861114 | 13.900011 | -0.281449 | 51.711200 | 0.863890 | 3.110002 | -0.281377 | 51.711228 | 0.858334 | 3.090002 | -0.281315 | 51.711135 | 0.597223 | 2.150002 | -0.281411 | 51.711056 | 2.122224 | 7.640006 | -0.280945 | 51.711247 | 0.0 | 0.0 |
6 | 6 | 12:08:24.8 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710891 | 0.855556 | 3.080002 | -0.281575 | 51.711258 | 1.680557 | 6.050005 | -0.281274 | 51.710967 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281378 | 51.711202 | 1.444446 | 5.200004 | -0.281559 | 51.711257 | 0.736112 | 2.650002 | -0.281012 | 51.711462 | 0.427778 | 1.540001 | -0.281294 | 51.711092 | 1.544446 | 5.560004 | -0.281574 | 51.711127 | 1.222223 | 4.400004 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711310 | 1.055556 | 3.800003 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281540 | 51.711306 | 3.652781 | 13.150011 | -0.281448 | 51.711201 | 1.011112 | 3.640003 | -0.281376 | 51.711228 | 0.919445 | 3.310003 | -0.281315 | 51.711135 | 0.747223 | 2.690002 | -0.281411 | 51.711058 | 2.252780 | 8.110006 | -0.280945 | 51.711247 | 0.0 | 0.0 |
7 | 7 | 12:08:24.9 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710890 | 0.861112 | 3.100002 | -0.281573 | 51.711257 | 2.011113 | 7.240006 | -0.281274 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281379 | 51.711202 | 1.122223 | 4.040003 | -0.281559 | 51.711257 | 0.544445 | 1.960002 | -0.281012 | 51.711462 | 0.000000 | 0.000000 | -0.281296 | 51.711092 | 1.494446 | 5.380004 | -0.281573 | 51.711128 | 1.336112 | 4.810004 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281474 | 51.711311 | 1.144445 | 4.120003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281536 | 51.711305 | 3.250003 | 11.700009 | -0.281449 | 51.711201 | 0.000000 | 0.000000 | -0.281375 | 51.711229 | 0.641667 | 2.310002 | -0.281314 | 51.711134 | 0.780556 | 2.810002 | -0.281411 | 51.711059 | 2.033335 | 7.320006 | -0.280945 | 51.711247 | 0.0 | 0.0 |
8 | 8 | 12:08:25.0 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710890 | 0.741667 | 2.670002 | -0.281571 | 51.711256 | 2.077779 | 7.480006 | -0.281275 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711374 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281381 | 51.711203 | 1.355557 | 4.880004 | -0.281559 | 51.711257 | 0.361111 | 1.300001 | -0.281012 | 51.711462 | 0.000000 | 0.000000 | -0.281299 | 51.711092 | 1.547223 | 5.570004 | -0.281571 | 51.711129 | 1.361112 | 4.900004 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281475 | 51.711312 | 1.016667 | 3.660003 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281533 | 51.711303 | 3.183336 | 11.460009 | -0.281450 | 51.711201 | 0.688889 | 2.480002 | -0.281374 | 51.711230 | 1.180557 | 4.250003 | -0.281314 | 51.711133 | 0.711112 | 2.560002 | -0.281411 | 51.711062 | 2.391669 | 8.610007 | -0.280945 | 51.711247 | 0.0 | 0.0 |
9 | 9 | 12:08:25.1 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281361 | 51.710889 | 0.908334 | 3.270003 | -0.281570 | 51.711253 | 2.269446 | 8.170007 | -0.281275 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711374 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281382 | 51.711204 | 1.133334 | 4.080003 | -0.281559 | 51.711257 | 0.363889 | 1.310001 | -0.281013 | 51.711461 | 0.580556 | 2.090002 | -0.281301 | 51.711092 | 1.486112 | 5.350004 | -0.281570 | 51.711129 | 1.191668 | 4.290003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281475 | 51.711313 | 0.763889 | 2.750002 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281530 | 51.711301 | 3.150003 | 11.340009 | -0.281452 | 51.711201 | 1.263890 | 4.550004 | -0.281373 | 51.711231 | 1.305557 | 4.700004 | -0.281313 | 51.711133 | 0.719445 | 2.590002 | -0.281411 | 51.711064 | 2.597224 | 9.350007 | -0.280944 | 51.711247 | 0.0 | 0.0 |
10 | 10 | 12:08:25.2 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281361 | 51.710888 | 1.077779 | 3.880003 | -0.281568 | 51.711251 | 2.791669 | 10.050008 | -0.281275 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711374 | 0.0 | 0.0 | -0.281083 | 51.711350 | 0.0 | 0.0 | -0.280962 | 51.711258 | 0.0 | 0.0 | -0.281384 | 51.711203 | 1.247223 | 4.490004 | -0.281560 | 51.711256 | 0.711112 | 2.560002 | -0.281014 | 51.711461 | 0.866667 | 3.120002 | -0.281303 | 51.711092 | 1.416668 | 5.100004 | -0.281568 | 51.711129 | 1.113890 | 4.010003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281475 | 51.711313 | 0.366667 | 1.320001 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281528 | 51.711299 | 2.622224 | 9.440008 | -0.281454 | 51.711202 | 1.486112 | 5.350004 | -0.281373 | 51.711231 | 0.619445 | 2.230002 | -0.281313 | 51.711132 | 0.825001 | 2.970002 | -0.281411 | 51.711066 | 2.369446 | 8.530007 | -0.280944 | 51.711247 | 0.0 | 0.0 |
11 | 11 | 12:08:25.3 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281360 | 51.710887 | 1.244445 | 4.480004 | -0.281568 | 51.711249 | 2.627780 | 9.460008 | -0.281275 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711374 | 0.0 | 0.0 | -0.281083 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711258 | 0.0 | 0.0 | -0.281386 | 51.711203 | 1.377779 | 4.960004 | -0.281561 | 51.711255 | 0.891667 | 3.210003 | -0.281014 | 51.711460 | 0.916667 | 3.300003 | -0.281304 | 51.711092 | 1.269445 | 4.570004 | -0.281567 | 51.711130 | 1.005556 | 3.620003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281475 | 51.711313 | 0.000000 | 0.000000 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281526 | 51.711296 | 2.927780 | 10.540008 | -0.281457 | 51.711202 | 1.566668 | 5.640005 | -0.281373 | 51.711231 | 0.386111 | 1.390001 | -0.281313 | 51.711131 | 1.005556 | 3.620003 | -0.281412 | 51.711068 | 2.786113 | 10.030008 | -0.280944 | 51.711247 | 0.0 | 0.0 |
12 | 12 | 12:08:25.4 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711392 | 0.0 | 0.0 | -0.281359 | 51.710886 | 1.200001 | 4.320003 | -0.281567 | 51.711247 | 2.400002 | 8.640007 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281083 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711258 | 0.0 | 0.0 | -0.281388 | 51.711202 | 1.558335 | 5.610004 | -0.281562 | 51.711255 | 0.866667 | 3.120002 | -0.281016 | 51.711459 | 1.097223 | 3.950003 | -0.281306 | 51.711092 | 1.247223 | 4.490004 | -0.281568 | 51.711131 | 0.944445 | 3.400003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281475 | 51.711313 | 0.000000 | 0.000000 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281525 | 51.711293 | 3.458336 | 12.450010 | -0.281460 | 51.711202 | 2.097224 | 7.550006 | -0.281373 | 51.711231 | 0.386111 | 1.390001 | -0.281313 | 51.711130 | 1.077779 | 3.880003 | -0.281413 | 51.711070 | 2.236113 | 8.050006 | -0.280944 | 51.711247 | 0.0 | 0.0 |
13 | 13 | 12:08:25.5 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281125 | 51.711392 | 0.0 | 0.0 | -0.281359 | 51.710885 | 0.936112 | 3.370003 | -0.281567 | 51.711244 | 2.494446 | 8.980007 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711258 | 0.0 | 0.0 | -0.281390 | 51.711202 | 1.538890 | 5.540004 | -0.281563 | 51.711254 | 0.794445 | 2.860002 | -0.281017 | 51.711459 | 1.155556 | 4.160003 | -0.281308 | 51.711091 | 1.438890 | 5.180004 | -0.281572 | 51.711132 | 2.138891 | 7.700006 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281474 | 51.711313 | 0.000000 | 0.000000 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281523 | 51.711290 | 3.463892 | 12.470010 | -0.281462 | 51.711202 | 1.572223 | 5.660005 | -0.281373 | 51.711231 | 0.000000 | 0.000000 | -0.281313 | 51.711129 | 1.180557 | 4.250003 | -0.281413 | 51.711072 | 1.591668 | 5.730005 | -0.280944 | 51.711247 | 0.0 | 0.0 |
14 | 14 | 12:08:25.6 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281359 | 51.710885 | 0.480556 | 1.730001 | -0.281567 | 51.711242 | 2.702780 | 9.730008 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711258 | 0.0 | 0.0 | -0.281392 | 51.711202 | 1.180557 | 4.250003 | -0.281563 | 51.711253 | 0.838890 | 3.020002 | -0.281018 | 51.711458 | 1.113890 | 4.010003 | -0.281310 | 51.711091 | 1.397223 | 5.030004 | -0.281575 | 51.711133 | 2.327780 | 8.380007 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281474 | 51.711312 | 0.408334 | 1.470001 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281521 | 51.711287 | 3.100002 | 11.160009 | -0.281463 | 51.711201 | 1.005556 | 3.620003 | -0.281374 | 51.711231 | 0.500000 | 1.800001 | -0.281313 | 51.711128 | 1.186112 | 4.270003 | -0.281414 | 51.711073 | 1.772224 | 6.380005 | -0.280944 | 51.711247 | 0.0 | 0.0 |
15 | 15 | 12:08:25.7 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281359 | 51.710885 | 0.000000 | 0.000000 | -0.281566 | 51.711240 | 2.419446 | 8.710007 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280963 | 51.711258 | 0.0 | 0.0 | -0.281393 | 51.711202 | 0.986112 | 3.550003 | -0.281564 | 51.711253 | 0.950001 | 3.420003 | -0.281020 | 51.711458 | 1.280557 | 4.610004 | -0.281313 | 51.711091 | 1.547223 | 5.570004 | -0.281577 | 51.711135 | 2.258335 | 8.130007 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711312 | 0.638889 | 2.300002 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281522 | 51.711285 | 2.861113 | 10.300008 | -0.281466 | 51.711201 | 1.519446 | 5.470004 | -0.281376 | 51.711230 | 1.027779 | 3.700003 | -0.281315 | 51.711127 | 1.350001 | 4.860004 | -0.281415 | 51.711074 | 1.658335 | 5.970005 | -0.280944 | 51.711247 | 0.0 | 0.0 |
16 | 16 | 12:08:25.8 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281359 | 51.710886 | 0.000000 | 0.000000 | -0.281566 | 51.711238 | 2.358335 | 8.490007 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280963 | 51.711257 | 0.0 | 0.0 | -0.281395 | 51.711201 | 1.336112 | 4.810004 | -0.281565 | 51.711252 | 1.216668 | 4.380004 | -0.281022 | 51.711458 | 1.400001 | 5.040004 | -0.281315 | 51.711090 | 1.508335 | 5.430004 | -0.281580 | 51.711137 | 2.702780 | 9.730008 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711311 | 0.744445 | 2.680002 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281523 | 51.711282 | 3.316669 | 11.940010 | -0.281466 | 51.711200 | 0.900001 | 3.240003 | -0.281377 | 51.711229 | 1.613890 | 5.810005 | -0.281316 | 51.711126 | 1.408334 | 5.070004 | -0.281416 | 51.711075 | 1.563890 | 5.630005 | -0.280944 | 51.711247 | 0.0 | 0.0 |
17 | 17 | 12:08:25.9 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281358 | 51.710886 | 0.413889 | 1.490001 | -0.281568 | 51.711236 | 2.425002 | 8.730007 | -0.281276 | 51.710966 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281396 | 51.711200 | 1.475001 | 5.310004 | -0.281566 | 51.711251 | 1.350001 | 4.860004 | -0.281024 | 51.711457 | 1.419446 | 5.110004 | -0.281317 | 51.711089 | 1.461112 | 5.260004 | -0.281583 | 51.711138 | 2.725002 | 9.810008 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281472 | 51.711311 | 0.611112 | 2.200002 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281524 | 51.711280 | 2.833336 | 10.200008 | -0.281467 | 51.711200 | 0.605556 | 2.180002 | -0.281379 | 51.711227 | 2.366669 | 8.520007 | -0.281317 | 51.711124 | 1.505557 | 5.420004 | -0.281417 | 51.711076 | 1.083334 | 3.900003 | -0.280943 | 51.711247 | 0.0 | 0.0 |
18 | 18 | 12:08:26.0 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281358 | 51.710887 | 0.552778 | 1.990002 | -0.281569 | 51.711234 | 2.630558 | 9.470008 | -0.281275 | 51.710965 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281398 | 51.711199 | 1.225001 | 4.410004 | -0.281567 | 51.711250 | 1.405557 | 5.060004 | -0.281025 | 51.711457 | 1.211112 | 4.360003 | -0.281318 | 51.711089 | 1.577779 | 5.680005 | -0.281588 | 51.711140 | 3.397225 | 12.230010 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711309 | 0.986112 | 3.550003 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281525 | 51.711278 | 2.463891 | 8.870007 | -0.281467 | 51.711199 | 1.050001 | 3.780003 | -0.281380 | 51.711225 | 2.391669 | 8.610007 | -0.281319 | 51.711124 | 1.438890 | 5.180004 | -0.281417 | 51.711077 | 1.238890 | 4.460004 | -0.280943 | 51.711246 | 0.0 | 0.0 |
19 | 19 | 12:08:26.1 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711392 | 0.0 | 0.0 | -0.281358 | 51.710887 | 0.616667 | 2.220002 | -0.281571 | 51.711232 | 2.600002 | 9.360007 | -0.281275 | 51.710965 | 0.0 | 0.0 | -0.281058 | 51.711374 | 0.0 | 0.0 | -0.281084 | 51.711351 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281399 | 51.711199 | 1.108334 | 3.990003 | -0.281568 | 51.711248 | 1.497223 | 5.390004 | NaN | NaN | NaN | NaN | -0.281320 | 51.711088 | 1.691668 | 6.090005 | NaN | NaN | NaN | NaN | -0.280962 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711309 | 0.955556 | 3.440003 | -0.280983 | 51.711295 | 0.0 | 0.0 | -0.281527 | 51.711276 | 2.672224 | 9.620008 | -0.281467 | 51.711197 | 1.194445 | 4.300003 | -0.281382 | 51.711223 | 2.202780 | 7.930006 | -0.281321 | 51.711123 | 1.369446 | 4.930004 | -0.281419 | 51.711077 | 1.152779 | 4.150003 | -0.280943 | 51.711246 | 0.0 | 0.0 |
# Define function for calculating the velocities and accelerations of the training data using the x, y locations and timestep
def calc_player_velocities_accelerations(df, date='2022-02-02', training_drill='NOT-DEFINED', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1):
""" calc_player_velocities_accelerations( training data )
Calculate player velocities in x & y direction, and total player speed at each timestamp of the tracking data
Parameters
-----------
df: the tracking DataFrame
smoothing_v: boolean variable that determines whether velocity measures are smoothed. Default is True.
filter: type of filter to use when smoothing_v the velocities. Default is Savitzky-Golay, which fits a polynomial of order 'polyorder' to the data within each window
window: smoothing_v window size in # of frames
polyorder: order of the polynomial for the Savitzky-Golay filter. Default is 1 - a linear fit to the velcoity, so gradient is the acceleration
maxspeed: the maximum speed that a player can realisitically achieve (in meters/second). Speed measures that exceed maxspeed are tagged as outliers and set to NaN.
Returns
-----------
df : the tracking DataFrame with columns for speed in the x & y direction and total speed added
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Calculation of each player\'s speed and accelerations for the {training_drill} started at: {tic}')
# Create columns
#df['Date Time [s]'] = pd.to_datetime(df['Date'] + ' ' + df['Time [s]'])
df['Period'] = 1
# remove any velocity data already in the dataframe
df = remove_player_velocities_accelerations(df)
# Get the player ids
player_ids = [col for col in df.columns if '_x' in col]
player_ids = [s.replace('_x', '') for s in player_ids]
# Calculate the timestep from one frame to the next - not required.
#dt = df['Time [s]'].diff()
#dt = df['Date Time [s]'].diff()
# index of first frame in second half
#second_half_idx = df.Period.idxmax(2)
second_half_idx = df[df.Period == 2].first_valid_index()
# estimate velocities for players in df
for player in player_ids: # cycle through players individually
# difference player positions in timestep dt to get unsmoothed estimate of velicity
vx = df[player + '_x'].diff() / dt
vy = df[player + '_y'].diff() / dt
if maxspeed>0:
# remove unsmoothed data points that exceed the maximum speed (these are most likely position errors)
raw_speed = np.sqrt( vx**2 + vy**2 )
#acceleration = raw_speed.diff() / dt
vx[ raw_speed>maxspeed ] = np.nan
vy[ raw_speed>maxspeed ] = np.nan
#if maxacc>0:
#ax[ raw_acc>maxacc ] = np.nan
#ay[ raw_acc>maxacc ] = np.nan
if smoothing_v:
if filter_=='Savitzky-Golay':
# calculate first half velocity
vx.loc[:second_half_idx] = signal.savgol_filter(vx.loc[:second_half_idx],window_length=window,polyorder=polyorder)
vy.loc[:second_half_idx] = signal.savgol_filter(vy.loc[:second_half_idx],window_length=window,polyorder=polyorder)
# calculate second half velocity
vx.loc[second_half_idx:] = signal.savgol_filter(vx.loc[second_half_idx:],window_length=window,polyorder=polyorder)
vy.loc[second_half_idx:] = signal.savgol_filter(vy.loc[second_half_idx:],window_length=window,polyorder=polyorder)
elif filter_=='moving average':
ma_window = np.ones( window ) / window
# calculate first half velocity
vx.loc[:second_half_idx] = np.convolve( vx.loc[:second_half_idx], ma_window, mode='same')
vy.loc[:second_half_idx] = np.convolve( vy.loc[:second_half_idx], ma_window, mode='same')
# calculate second half velocity
vx.loc[second_half_idx:] = np.convolve( vx.loc[second_half_idx:], ma_window, mode='same')
vy.loc[second_half_idx:] = np.convolve( vy.loc[second_half_idx:], ma_window, mode='same')
#speed = ( vx**2 + vy**2 )**.5
#acceleration = np.diff(speed) / dt
#ax = np.convolve( ax, ma_window, mode='same' )
#ay = np.convolve( ay, ma_window, mode='same' )
# put player speed in x, y direction, and total speed back in the data frame
# put player speed in x, y direction, and total speed back in the data frame
df[player + '_vx'] = vx
df[player + '_vy'] = vy
df[player + '_speed'] = np.sqrt(vx**2 + vy**2)
#df[player + '_ax'] = ax
#df[player + '_ay'] = ay
#df[player + '_rawspeed'] = raw_speed
#df[player + '_rawacc'] = raw_acc
df[player + '_speed'] = np.sqrt(vx**2 + vy**2)
# Calculate acceleration - method 1, using speed calculated
#acceleration = df[player + '_speed'].diff() / dt
#df[player + '_acceleration'] = acceleration
# Calculate acceleration - method 2, using speed provided
acceleration = df[player + ' Speed (m/s)'].diff() / dt
#acceleration = (df[player + ' Speed (m/s)'] - df[player + ' Speed (m/s)'].shift()) / 0.1
df[player + ' Acceleration (m/s/s)'] = acceleration
if smoothing_a:
ma_window = np.ones( window ) / window
df[player + ' Acceleration (m/s/s)'] = np.convolve( acceleration, ma_window, mode='same')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset')
#### Save the combined file as a CSV
df.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Calculation of each player\'s speed and accelerations for the {training_drill} ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to calculate speed and acceleration and save the training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Training data with calculated velocities and accelerations already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '4_modified_individual_training_sessions_dataset', f'{date}-{training_drill}-MOVEMENT-SPEED-ACCELERATION-TRAINING-DATA-ALL-PLAYERS.csv'))
## Return the DataFrame
return(df)
def remove_player_velocities_accelerations(df):
# remove player velocities and acceleration measures that are already in the 'df' dataframe
columns = [c for c in df.columns if c.split('_')[-1] in ['vx', 'vy', 'ax', 'ay', 'rawspeed', 'rawacc', 'speed', 'acceleration']] # Get the player ids
df = df.drop(columns=columns)
return df
# Calculate the velocity and accelerations for each player in each of the six training sessions
df_training_match_msg_vel = calc_player_velocities_accelerations(df=df_training_match_msg_pvt, date='2022-02-02', training_drill='MATCH-MSG', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_crossing_and_finishing_hsr_spr_vel = calc_player_velocities_accelerations(df=df_training_crossing_and_finishing_hsr_spr_pvt, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_attack_vs_defence_attack_superiority_vel = calc_player_velocities_accelerations(df=df_training_attack_vs_defence_attack_superiority_pvt, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_full_session_modified_vel = calc_player_velocities_accelerations(df=df_training_full_session_modified_pvt, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_passing_drill_physical_vel = calc_player_velocities_accelerations(df=df_training_passing_drill_physical_pvt, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
df_training_warm_up_coordination_agility_vel = calc_player_velocities_accelerations(df=df_training_warm_up_coordination_agility_pvt, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY', smoothing_v=True, smoothing_a=True, filter_='moving_average', window=7, polyorder=1, maxspeed=12, dt=0.1)
Calculation of each player's speed and accelerations for the MATCH-MSG started at: 2022-02-15 23:02:52.807892 Calculation of each player's speed and accelerations for the MATCH-MSG ended at: 2022-02-15 23:02:55.054667 Time taken to calculate speed and acceleration and save the training data is: 2.25 seconds. Calculation of each player's speed and accelerations for the CROSSING-AND-FINISHING-HSR-SPR started at: 2022-02-15 23:02:55.056062 Calculation of each player's speed and accelerations for the CROSSING-AND-FINISHING-HSR-SPR ended at: 2022-02-15 23:02:56.524802 Time taken to calculate speed and acceleration and save the training data is: 1.47 seconds. Calculation of each player's speed and accelerations for the ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY started at: 2022-02-15 23:02:56.528356 Calculation of each player's speed and accelerations for the ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY ended at: 2022-02-15 23:03:01.069064 Time taken to calculate speed and acceleration and save the training data is: 4.54 seconds. Calculation of each player's speed and accelerations for the FULL-SESSION-MODIFIED started at: 2022-02-15 23:03:01.070138 Calculation of each player's speed and accelerations for the FULL-SESSION-MODIFIED ended at: 2022-02-15 23:03:13.994752 Time taken to calculate speed and acceleration and save the training data is: 12.92 seconds. Calculation of each player's speed and accelerations for the PASSING-DRILL-PHYSICAL started at: 2022-02-15 23:03:14.012260 Calculation of each player's speed and accelerations for the PASSING-DRILL-PHYSICAL ended at: 2022-02-15 23:03:15.304846 Time taken to calculate speed and acceleration and save the training data is: 1.29 seconds. Calculation of each player's speed and accelerations for the WARM-UP-COORDINATION-AGILITY started at: 2022-02-15 23:03:15.307837 Calculation of each player's speed and accelerations for the WARM-UP-COORDINATION-AGILITY ended at: 2022-02-15 23:03:16.238336 Time taken to calculate speed and acceleration and save the training data is: 0.93 seconds.
sorted(df_training_attack_vs_defence_attack_superiority_vel.columns)
['Baah Acceleration (m/s/s)', 'Baah Speed (km/h)', 'Baah Speed (m/s)', 'Baah_acceleration', 'Baah_speed', 'Baah_vx', 'Baah_vy', 'Baah_x', 'Baah_y', 'Cathcart Acceleration (m/s/s)', 'Cathcart Speed (km/h)', 'Cathcart Speed (m/s)', 'Cathcart_acceleration', 'Cathcart_speed', 'Cathcart_vx', 'Cathcart_vy', 'Cathcart_x', 'Cathcart_y', 'Date', 'Dennis Acceleration (m/s/s)', 'Dennis Speed (km/h)', 'Dennis Speed (m/s)', 'Dennis_acceleration', 'Dennis_speed', 'Dennis_vx', 'Dennis_vy', 'Dennis_x', 'Dennis_y', 'Ekong Acceleration (m/s/s)', 'Ekong Speed (km/h)', 'Ekong Speed (m/s)', 'Ekong_acceleration', 'Ekong_speed', 'Ekong_vx', 'Ekong_vy', 'Ekong_x', 'Ekong_y', 'Etebo Acceleration (m/s/s)', 'Etebo Speed (km/h)', 'Etebo Speed (m/s)', 'Etebo_acceleration', 'Etebo_speed', 'Etebo_vx', 'Etebo_vy', 'Etebo_x', 'Etebo_y', 'Femenia Acceleration (m/s/s)', 'Femenia Speed (km/h)', 'Femenia Speed (m/s)', 'Femenia_acceleration', 'Femenia_speed', 'Femenia_vx', 'Femenia_vy', 'Femenia_x', 'Femenia_y', 'Fletcher Acceleration (m/s/s)', 'Fletcher Speed (km/h)', 'Fletcher Speed (m/s)', 'Fletcher_acceleration', 'Fletcher_speed', 'Fletcher_vx', 'Fletcher_vy', 'Fletcher_x', 'Fletcher_y', 'Frame', 'Gosling Acceleration (m/s/s)', 'Gosling Speed (km/h)', 'Gosling Speed (m/s)', 'Gosling_acceleration', 'Gosling_speed', 'Gosling_vx', 'Gosling_vy', 'Gosling_x', 'Gosling_y', 'Joao Pedro Acceleration (m/s/s)', 'Joao Pedro Speed (km/h)', 'Joao Pedro Speed (m/s)', 'Joao Pedro_acceleration', 'Joao Pedro_speed', 'Joao Pedro_vx', 'Joao Pedro_vy', 'Joao Pedro_x', 'Joao Pedro_y', 'Kabasele Acceleration (m/s/s)', 'Kabasele Speed (km/h)', 'Kabasele Speed (m/s)', 'Kabasele_acceleration', 'Kabasele_speed', 'Kabasele_vx', 'Kabasele_vy', 'Kabasele_x', 'Kabasele_y', 'Kalu Acceleration (m/s/s)', 'Kalu Speed (km/h)', 'Kalu Speed (m/s)', 'Kalu_acceleration', 'Kalu_speed', 'Kalu_vx', 'Kalu_vy', 'Kalu_x', 'Kalu_y', 'Kamara Acceleration (m/s/s)', 'Kamara Speed (km/h)', 'Kamara Speed (m/s)', 'Kamara_acceleration', 'Kamara_speed', 'Kamara_vx', 'Kamara_vy', 'Kamara_x', 'Kamara_y', 'Kayembe Acceleration (m/s/s)', 'Kayembe Speed (km/h)', 'Kayembe Speed (m/s)', 'Kayembe_acceleration', 'Kayembe_speed', 'Kayembe_vx', 'Kayembe_vy', 'Kayembe_x', 'Kayembe_y', 'King Acceleration (m/s/s)', 'King Speed (km/h)', 'King Speed (m/s)', 'King_acceleration', 'King_speed', 'King_vx', 'King_vy', 'King_x', 'King_y', 'Kucka Acceleration (m/s/s)', 'Kucka Speed (km/h)', 'Kucka Speed (m/s)', 'Kucka_acceleration', 'Kucka_speed', 'Kucka_vx', 'Kucka_vy', 'Kucka_x', 'Kucka_y', 'Louza Acceleration (m/s/s)', 'Louza Speed (km/h)', 'Louza Speed (m/s)', 'Louza_acceleration', 'Louza_speed', 'Louza_vx', 'Louza_vy', 'Louza_x', 'Louza_y', 'Masina Acceleration (m/s/s)', 'Masina Speed (km/h)', 'Masina Speed (m/s)', 'Masina_acceleration', 'Masina_speed', 'Masina_vx', 'Masina_vy', 'Masina_x', 'Masina_y', 'Ngakia Acceleration (m/s/s)', 'Ngakia Speed (km/h)', 'Ngakia Speed (m/s)', 'Ngakia_acceleration', 'Ngakia_speed', 'Ngakia_vx', 'Ngakia_vy', 'Ngakia_x', 'Ngakia_y', 'Period', 'Samir Acceleration (m/s/s)', 'Samir Speed (km/h)', 'Samir Speed (m/s)', 'Samir_acceleration', 'Samir_speed', 'Samir_vx', 'Samir_vy', 'Samir_x', 'Samir_y', 'Sema Acceleration (m/s/s)', 'Sema Speed (km/h)', 'Sema Speed (m/s)', 'Sema_acceleration', 'Sema_speed', 'Sema_vx', 'Sema_vy', 'Sema_x', 'Sema_y', 'Sierralta Acceleration (m/s/s)', 'Sierralta Speed (km/h)', 'Sierralta Speed (m/s)', 'Sierralta_acceleration', 'Sierralta_speed', 'Sierralta_vx', 'Sierralta_vy', 'Sierralta_x', 'Sierralta_y', 'Sissoko Acceleration (m/s/s)', 'Sissoko Speed (km/h)', 'Sissoko Speed (m/s)', 'Sissoko_acceleration', 'Sissoko_speed', 'Sissoko_vx', 'Sissoko_vy', 'Sissoko_x', 'Sissoko_y', 'Time [s]', 'Training Drill']
# Display DataFrame - ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY
df_training_attack_vs_defence_attack_superiority_vel.head()
Frame | Time [s] | Date | Training Drill | Ngakia_x | Ngakia_y | Ngakia Speed (m/s) | Ngakia Speed (km/h) | Cathcart_x | Cathcart_y | Cathcart Speed (m/s) | Cathcart Speed (km/h) | Etebo_x | Etebo_y | Etebo Speed (m/s) | Etebo Speed (km/h) | Dennis_x | Dennis_y | Dennis Speed (m/s) | Dennis Speed (km/h) | Kayembe_x | Kayembe_y | Kayembe Speed (m/s) | Kayembe Speed (km/h) | Baah_x | Baah_y | Baah Speed (m/s) | Baah Speed (km/h) | Sierralta_x | Sierralta_y | Sierralta Speed (m/s) | Sierralta Speed (km/h) | Ekong_x | Ekong_y | Ekong Speed (m/s) | Ekong Speed (km/h) | King_x | King_y | King Speed (m/s) | King Speed (km/h) | Kucka_x | Kucka_y | Kucka Speed (m/s) | Kucka Speed (km/h) | Kamara_x | Kamara_y | Kamara Speed (m/s) | Kamara Speed (km/h) | Fletcher_x | Fletcher_y | Fletcher Speed (m/s) | Fletcher Speed (km/h) | Louza_x | Louza_y | Louza Speed (m/s) | Louza Speed (km/h) | Gosling_x | Gosling_y | Gosling Speed (m/s) | Gosling Speed (km/h) | Femenia_x | Femenia_y | Femenia Speed (m/s) | Femenia Speed (km/h) | Samir_x | Samir_y | Samir Speed (m/s) | Samir Speed (km/h) | Joao Pedro_x | Joao Pedro_y | Joao Pedro Speed (m/s) | Joao Pedro Speed (km/h) | Sissoko_x | Sissoko_y | Sissoko Speed (m/s) | Sissoko Speed (km/h) | Kabasele_x | Kabasele_y | Kabasele Speed (m/s) | Kabasele Speed (km/h) | Masina_x | Masina_y | Masina Speed (m/s) | Masina Speed (km/h) | Kalu_x | Kalu_y | Kalu Speed (m/s) | Kalu Speed (km/h) | Sema_x | Sema_y | Sema Speed (m/s) | Sema Speed (km/h) | Period | Ngakia_vx | Ngakia_vy | Ngakia_speed | Ngakia Acceleration (m/s/s) | Cathcart_vx | Cathcart_vy | Cathcart_speed | Cathcart Acceleration (m/s/s) | Etebo_vx | Etebo_vy | Etebo_speed | Etebo Acceleration (m/s/s) | Dennis_vx | Dennis_vy | Dennis_speed | Dennis Acceleration (m/s/s) | Kayembe_vx | Kayembe_vy | Kayembe_speed | Kayembe Acceleration (m/s/s) | Baah_vx | Baah_vy | Baah_speed | Baah Acceleration (m/s/s) | Sierralta_vx | Sierralta_vy | Sierralta_speed | Sierralta Acceleration (m/s/s) | Ekong_vx | Ekong_vy | Ekong_speed | Ekong Acceleration (m/s/s) | King_vx | King_vy | King_speed | King Acceleration (m/s/s) | Kucka_vx | Kucka_vy | Kucka_speed | Kucka Acceleration (m/s/s) | Kamara_vx | Kamara_vy | Kamara_speed | Kamara Acceleration (m/s/s) | Fletcher_vx | Fletcher_vy | Fletcher_speed | Fletcher Acceleration (m/s/s) | Louza_vx | Louza_vy | Louza_speed | Louza Acceleration (m/s/s) | Gosling_vx | Gosling_vy | Gosling_speed | Gosling Acceleration (m/s/s) | Femenia_vx | Femenia_vy | Femenia_speed | Femenia Acceleration (m/s/s) | Samir_vx | Samir_vy | Samir_speed | Samir Acceleration (m/s/s) | Joao Pedro_vx | Joao Pedro_vy | Joao Pedro_speed | Joao Pedro Acceleration (m/s/s) | Sissoko_vx | Sissoko_vy | Sissoko_speed | Sissoko Acceleration (m/s/s) | Kabasele_vx | Kabasele_vy | Kabasele_speed | Kabasele Acceleration (m/s/s) | Masina_vx | Masina_vy | Masina_speed | Masina Acceleration (m/s/s) | Kalu_vx | Kalu_vy | Kalu_speed | Kalu Acceleration (m/s/s) | Sema_vx | Sema_vy | Sema_speed | Sema Acceleration (m/s/s) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 12:08:24.2 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710896 | 1.055556 | 3.800003 | -0.281581 | 51.711263 | 2.361113 | 8.500007 | -0.281273 | 51.710967 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.71135 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281370 | 51.711195 | 1.927779 | 6.940006 | -0.281566 | 51.711257 | 2.025002 | 7.290006 | -0.281005 | 51.711463 | 0.558334 | 2.010002 | -0.281284 | 51.711088 | 1.305557 | 4.700004 | -0.281582 | 51.711126 | 1.575001 | 5.670005 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281478 | 51.711305 | 1.105556 | 3.980003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281566 | 51.711311 | 1.950002 | 7.020006 | -0.281447 | 51.711196 | 1.438890 | 5.180004 | -0.281382 | 51.711224 | 0.741667 | 2.670002 | -0.281316 | 51.711136 | 0.0 | 0.0 | -0.28141 | 51.711047 | 1.594446 | 5.740005 | -0.280945 | 51.711247 | 0.0 | 0.0 | 1 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 2.023811 | NaN | NaN | NaN | 21.269858 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 2.222224 | NaN | NaN | NaN | 19.285730 | NaN | NaN | NaN | -6.547624 | NaN | NaN | NaN | -2.539685 | NaN | NaN | NaN | 12.777788 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 1.825398 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | -18.333348 | NaN | NaN | NaN | 7.420641 | NaN | NaN | NaN | -9.484135 | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | -7.103180 | NaN | NaN | NaN | 0.0 |
1 | 1 | 12:08:24.3 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281125 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710895 | 1.022223 | 3.680003 | -0.281580 | 51.711263 | 1.075001 | 3.870003 | -0.281273 | 51.710967 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.71135 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281372 | 51.711195 | 1.863890 | 6.710005 | -0.281564 | 51.711257 | 1.175001 | 4.230003 | -0.281006 | 51.711462 | 0.852778 | 3.070002 | -0.281285 | 51.711089 | 1.261112 | 4.540004 | -0.281581 | 51.711126 | 1.377779 | 4.960004 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281477 | 51.711306 | 0.961112 | 3.460003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281564 | 51.711310 | 2.183335 | 7.860006 | -0.281448 | 51.711197 | 1.466668 | 5.280004 | -0.281381 | 51.711224 | 0.633334 | 2.280002 | -0.281316 | 51.711136 | 0.0 | 0.0 | -0.28141 | 51.711048 | 1.577779 | 5.680005 | -0.280945 | 51.711247 | 0.0 | 0.0 | 1 | 0.000000 | 0.000000 | 0.000000 | 0.0 | -0.000017 | 0.0 | 0.000017 | 0.0 | 0.000050 | -0.000083 | 0.000097 | 2.023811 | 0.000050 | 0.000033 | 0.000060 | 15.158742 | 0.000000 | -0.000033 | 0.000033 | 0.0 | -0.000017 | 0.000017 | 0.000024 | 0.0 | -0.000017 | 0.000000 | 0.000017 | 0.0 | 0.000017 | 0.000000 | 0.000017 | 0.0 | -0.000217 | 0.000067 | 0.000227 | -1.190477 | 0.000150 | 0.000000 | 0.000150 | 16.984141 | -0.000117 | -0.000033 | 0.000121 | -0.714286 | -0.000133 | 0.000083 | 0.000157 | -2.460319 | 0.000183 | -0.000017 | 0.000184 | 9.246039 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000050 | 0.000083 | 0.000097 | 1.349207 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000250 | -0.000133 | 0.000283 | -27.301609 | -0.000133 | 0.000100 | 0.000167 | 8.214292 | 0.000050 | 0.000033 | 0.000060 | -1.666668 | 0.000000 | 0.000000 | 0.000000 | -8.531753 | -0.000050 | 0.000150 | 0.000158 | -7.539689 | 0.000000 | 0.0 | 0.000000 | 0.0 |
2 | 2 | 12:08:24.4 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710894 | 0.950001 | 3.420003 | -0.281579 | 51.711263 | 0.733334 | 2.640002 | -0.281273 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.71135 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281374 | 51.711196 | 1.863890 | 6.710005 | -0.281564 | 51.711258 | 0.841667 | 3.030002 | -0.281008 | 51.711462 | 1.019445 | 3.670003 | -0.281287 | 51.711089 | 1.352779 | 4.870004 | -0.281579 | 51.711126 | 1.047223 | 3.770003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281476 | 51.711307 | 1.566668 | 5.640005 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281560 | 51.711309 | 2.572224 | 9.260007 | -0.281449 | 51.711198 | 0.975001 | 3.510003 | -0.281379 | 51.711225 | 1.480557 | 5.330004 | -0.281316 | 51.711136 | 0.0 | 0.0 | -0.28141 | 51.711050 | 1.472223 | 5.300004 | -0.280945 | 51.711247 | 0.0 | 0.0 | 1 | -0.000017 | 0.000000 | 0.000017 | 0.0 | 0.000017 | 0.0 | 0.000017 | 0.0 | 0.000017 | -0.000083 | 0.000085 | 2.857145 | 0.000117 | -0.000067 | 0.000134 | 9.722230 | 0.000000 | -0.000033 | 0.000033 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000017 | 0.000017 | 0.0 | 0.000017 | -0.000017 | 0.000024 | 0.0 | -0.000217 | 0.000117 | 0.000246 | 6.904767 | 0.000083 | 0.000033 | 0.000090 | 18.412713 | -0.000167 | -0.000017 | 0.000167 | 1.865081 | -0.000167 | 0.000083 | 0.000186 | -3.412701 | 0.000117 | 0.000000 | 0.000117 | 5.039687 | 0.0 | -0.000017 | 0.000017 | 0.0 | 0.000117 | 0.000133 | 0.000177 | 0.714286 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000383 | -0.000050 | 0.000387 | -24.325416 | -0.000067 | 0.000067 | 0.000094 | 6.111116 | 0.000167 | 0.000117 | 0.000203 | -2.539685 | -0.000017 | 0.000017 | 0.000024 | -10.674612 | 0.000000 | 0.000150 | 0.000150 | -9.404769 | 0.000000 | 0.0 | 0.000000 | 0.0 |
3 | 3 | 12:08:24.5 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711385 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710893 | 0.975001 | 3.510003 | -0.281578 | 51.711262 | 0.847223 | 3.050002 | -0.281273 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.71135 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281376 | 51.711197 | 1.491668 | 5.370004 | -0.281562 | 51.711257 | 0.786112 | 2.830002 | -0.281010 | 51.711462 | 1.294445 | 4.660004 | -0.281288 | 51.711090 | 1.433334 | 5.160004 | -0.281578 | 51.711126 | 0.891667 | 3.210003 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281474 | 51.711308 | 1.252779 | 4.510004 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281555 | 51.711308 | 3.047225 | 10.970009 | -0.281449 | 51.711199 | 0.950001 | 3.420003 | -0.281378 | 51.711226 | 1.597224 | 5.750005 | -0.281316 | 51.711136 | 0.0 | 0.0 | -0.28141 | 51.711052 | 2.119446 | 7.630006 | -0.280945 | 51.711247 | 0.0 | 0.0 | 1 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | -0.000083 | 0.000083 | 2.777780 | 0.000117 | -0.000050 | 0.000127 | 5.000004 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | -0.000017 | 0.000000 | 0.000017 | 0.0 | 0.000017 | -0.000017 | 0.000024 | 0.0 | -0.000150 | 0.000100 | 0.000180 | 11.507946 | 0.000117 | -0.000017 | 0.000118 | 21.150811 | -0.000200 | 0.000000 | 0.000200 | 7.976197 | -0.000200 | 0.000067 | 0.000211 | -2.698415 | 0.000117 | 0.000017 | 0.000118 | 3.412701 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000167 | 0.000033 | 0.000170 | -0.555556 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000433 | -0.000100 | 0.000445 | -18.571443 | -0.000033 | 0.000067 | 0.000075 | 20.555572 | 0.000117 | 0.000117 | 0.000165 | 1.428573 | 0.000017 | -0.000017 | 0.000024 | -11.150803 | 0.000017 | 0.000217 | 0.000217 | -6.269846 | 0.000017 | 0.0 | 0.000017 | 0.0 |
4 | 4 | 12:08:24.6 | 2022-02-02 | ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY | -0.281091 | 51.711386 | 0.0 | 0.0 | -0.281124 | 51.711393 | 0.0 | 0.0 | -0.281362 | 51.710892 | 0.913890 | 3.290003 | -0.281577 | 51.711261 | 0.872223 | 3.140003 | -0.281274 | 51.710966 | 0.0 | 0.0 | -0.281057 | 51.711373 | 0.0 | 0.0 | -0.281083 | 51.71135 | 0.0 | 0.0 | -0.280962 | 51.711257 | 0.0 | 0.0 | -0.281377 | 51.711199 | 1.772224 | 6.380005 | -0.281562 | 51.711258 | 0.675001 | 2.430002 | -0.281011 | 51.711462 | 1.016667 | 3.660003 | -0.281290 | 51.711091 | 1.483335 | 5.340004 | -0.281577 | 51.711126 | 0.680556 | 2.450002 | -0.280963 | 51.711248 | 0.0 | 0.0 | -0.281473 | 51.711308 | 0.977779 | 3.520003 | -0.280984 | 51.711295 | 0.0 | 0.0 | -0.281551 | 51.711308 | 3.233336 | 11.640009 | -0.281449 | 51.711200 | 0.919445 | 3.310003 | -0.281377 | 51.711227 | 1.405557 | 5.060004 | -0.281315 | 51.711136 | 0.0 | 0.0 | -0.28141 | 51.711054 | 2.091668 | 7.530006 | -0.280945 | 51.711247 | 0.0 | 0.0 | 1 | 0.000017 | 0.000017 | 0.000024 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | -0.000017 | -0.000067 | 0.000069 | 4.007940 | 0.000067 | -0.000083 | 0.000107 | -14.325408 | -0.000017 | 0.000000 | 0.000017 | 0.0 | 0.000000 | 0.000017 | 0.000017 | 0.0 | 0.000000 | 0.000017 | 0.000017 | 0.0 | 0.000000 | 0.000017 | 0.000017 | 0.0 | -0.000133 | 0.000133 | 0.000189 | 7.261911 | 0.000083 | 0.000017 | 0.000085 | 11.626993 | -0.000133 | 0.000000 | 0.000133 | 12.182549 | -0.000167 | 0.000083 | 0.000186 | -4.087305 | 0.000083 | -0.000017 | 0.000085 | 0.238095 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000100 | 0.000050 | 0.000112 | -0.793651 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000450 | -0.000050 | 0.000453 | -14.285726 | 0.000000 | 0.000083 | 0.000083 | 11.111120 | 0.000100 | 0.000100 | 0.000141 | -7.817467 | 0.000017 | -0.000017 | 0.000024 | -10.158738 | 0.000017 | 0.000167 | 0.000167 | -11.626993 | 0.000000 | 0.0 | 0.000000 | 0.0 |
The speed zones are defined as:
For further information, see: Application of Individualized Speed Zones to Quantify External Training Load in Professional Soccer by Vincenzo Rago, João Brito, Pedro Figueiredo, Peter Krustrup, and António Rebelo.
# Define a function to generate a bespoke physical summary of all the players for an individual training session
def create_physical_report_per_training_session(df, date='2022-02-02', training_drill='NOT-DEFINED'):
"""
Define a function to generate a bespoke physical summary of all the players for an individual training session
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Creation of the physical report for the {training_drill} training drill started at: {tic}')
## Data Engineering
###
lst_cols = list(df)
###
lst_players = []
###
for col in lst_cols:
if '_x' in col:
col = col.replace('_x', '')
lst_players.append(col)
### Create DataFrame where each row is a player
df_summary = pd.DataFrame(lst_players, columns=['Player'])
##
df_summary['Date'] = date
df_summary['Training Drill'] = training_drill
## Calculate minutes played for each player
### Create empty list for minutes
lst_minutes = []
### Cycle through each player's jersey number in the team and look for the first and last time for each player
for player in lst_players:
#### Search for first and last frames that we have a position observation for each player (when a player is not on the pitch positions are NaN)
column = f'{player}' + '_x' # use player x-position coordinate
try:
player_minutes = (df[column].last_valid_index() - df[column].first_valid_index() + 1) / 10 / 60 # convert to minutes
except:
player_minutes = 0
lst_minutes.append(player_minutes)
### Create column for the minute played
df_summary['Minutes Trained'] = lst_minutes
### Sort values by minutes played descending
df_summary = df_summary.sort_values(['Minutes Trained'], ascending=False)
## Calculate total distance covered for each player
### Create empty list for distance
lst_distance = []
### Cycle through each player and multiple their speed at any given instance by 10ms to get total distance and divide by 1,000 to get this in km
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
df_player_distance = df[column].sum()/100./1000 # speed time. Convert to km (original logic)
#df_player_distance = (df[column].sum() * 0.01) / 1000 # Distance = Speed * Time
lst_distance.append(df_player_distance)
### Create column for the distance in km
df_summary['Distance [km]'] = lst_distance
## Calculate total distance covered for each player for different types of movement
### Create empty lists for distances of different movements
lst_lsa = []
lst_msr = []
lst_hsr = []
lst_sprinting = []
### Cycle through each player's jersey number in the team and
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
### Low-Speed Activities (LSA) (<14 km/h or <4 m/s)
player_distance = df.loc[df[column] < 4, column].sum()/100./1000
#player_distance = df.loc[df[column] < 14.4, column].sum()/100./1000
lst_lsa.append(player_distance)
### Moderate-Speed Running (MSR) (14.4–19.8 km/h or 4-5.5 m/s)
player_distance = df.loc[(df[column] >= 4) & (df[column] < 5.5), column].sum()/100./1000
#player_distance = df.loc[(df[column] >= 14.4) & (df[column] < 19.8), column].sum()/100./1000
lst_msr.append(player_distance)
### High-Speed Running (HSR) (19.8–25.1 km/h or 5.5-6.972 m/s)
player_distance = df.loc[(df[column] >= 5.5) & (df[column] < 6.972222), column].sum()/100./1000
#player_distance = df.loc[(df[column] >= 19.8) & (df[column] < 25.1), column].sum()/100./1000
lst_hsr.append(player_distance)
### Sprinting (≥25.2 km km/h or ≥6.972 m/s)
player_distance = df.loc[df[column] >= 6.972222, column].sum()/100./1000
#player_distance = df.loc[df[column] >= 25.2, column].sum()/100./1000
lst_sprinting.append(player_distance)
### Assign each movement list to a column in the Summary DataFrame
df_summary['Low-Speed Activities (LSA) [km]'] = lst_lsa
df_summary['Moderate-Speed Running (MSR) [km]'] = lst_msr
df_summary['High-Speed Running (HSR) [km]'] = lst_hsr
df_summary['Sprinting [km]'] = lst_sprinting
## Reset index
df_summary = df_summary.reset_index(drop=True)
## Determine the number of sustained sprints per match
### Create an empty list for the number of sprints
nsprints = []
###
#sprint_threshold = 25.2 # minimum speed to be defined as a sprint (km/h)
sprint_threshold = 6.972222 # minimum speed to be defined as a sprint (m/s)
sprint_window = 1 * 10
###
for player in lst_players:
column = f'{player}' + ' Speed (m/s)'
# trick here is to convolve speed with a window of size 'sprint_window', and find number of occassions that sprint was sustained for at least one window length
# diff helps us to identify when the window starts
player_sprints = np.diff(1 * (np.convolve(1 * (df[column] >= sprint_threshold), np.ones(sprint_window), mode='same') >= sprint_window))
nsprints.append(np.sum(player_sprints == 1 ))
### Add column for the number of sprints
df_summary['No. Sprints'] = nsprints
## Estimate the top speed of each player
### Create empty dictionary to append maximum speeds
dict_top_speeds = {}
### Iterate through the columns of the training DataFrame for the top speeds
player_speed_columns = [i for i in df.columns if ' Speed (m/s)' in i]
### Iterate through all the rows of all the speed columns, to determine the maximum speed for each player
for player in player_speed_columns:
dict_top_speeds[player] = df[player].max()
###
df_top_speeds = pd.DataFrame.from_dict(dict_top_speeds, orient='index', columns=['Top Speed [m/s]'])
###
df_top_speeds = df_top_speeds.reset_index(drop=False)
###
df_top_speeds = df_top_speeds.rename(columns={'index': 'Player'})
###
df_top_speeds['Player'] = df_top_speeds['Player'].str.replace(' Speed (m/s)', '')
###
df_top_speeds['Player'] = df_top_speeds['Player'].str.replace(' Speed \(m/s\)', '')
### Merge Top Speeds DataFrame to Summary DataFrame
df_summary = pd.merge(df_summary, df_top_speeds, left_on=['Player'], right_on=['Player'], how='left')
## Estimate the top acceleration of each player
### Create empty dictionary to append maximum accelerations
dict_top_accelerations = {}
### Iterate through the columns of the training DataFrame for the top accelerations
player_acceleration_columns = [i for i in df.columns if ' Acceleration (m/s/s)' in i]
### Iterate through all the rows of all the acceleration columns, to determine the maximum acceleration for each player
for player in player_acceleration_columns:
dict_top_accelerations[player] = df[player].max()
###
df_top_accelerations = pd.DataFrame.from_dict(dict_top_accelerations, orient='index', columns=['Top Acceleration [m/s/s]'])
###
df_top_accelerations = df_top_accelerations.reset_index(drop=False)
###
df_top_accelerations = df_top_accelerations.rename(columns={'index': 'Player'})
###
df_top_accelerations['Player'] = df_top_accelerations['Player'].str.replace(' Acceleration (m/s/s)', '')
###
df_top_accelerations['Player'] = df_top_accelerations['Player'].str.replace(' Acceleration \(m/s/s\)', '')
### Merge Top Speeds DataFrame to Summary DataFrame
df_summary = pd.merge(df_summary, df_top_accelerations, left_on=['Player'], right_on=['Player'], how='left')
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports')
#### Save the combined file as a CSV
df_summary.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### End timer
toc = datetime.datetime.now()
### Print time of engineering of tracking data ended
print(f'Creation of the physical report for the {training_drill} training drill ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to create the physical report for the {training_drill} training data is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('Physical report already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_summary = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-{training_drill}-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
## Return DataFrame
return df_summary
# Create physical reports for each player in each of the six training sessions
df_training_match_msg_physical_report = create_physical_report_per_training_session(df_training_match_msg_vel, date='2022-02-02', training_drill='MATCH-MSG')
df_training_crossing_and_finishing_hsr_spr_physical_report = create_physical_report_per_training_session(df_training_crossing_and_finishing_hsr_spr_vel, date='2022-02-02', training_drill='CROSSING-AND-FINISHING-HSR-SPR')
df_training_attack_vs_defence_attack_superiority_physical_report = create_physical_report_per_training_session(df_training_attack_vs_defence_attack_superiority_vel, date='2022-02-02', training_drill='ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY')
df_training_full_session_modified_physical_report = create_physical_report_per_training_session(df_training_full_session_modified_vel, date='2022-02-02', training_drill='FULL-SESSION-MODIFIED')
df_training_passing_drill_physical_physical_report = create_physical_report_per_training_session(df_training_passing_drill_physical_vel, date='2022-02-02', training_drill='PASSING-DRILL-PHYSICAL')
df_training_warm_up_coordination_agility_physical_report = create_physical_report_per_training_session(df_training_warm_up_coordination_agility_vel, date='2022-02-02', training_drill='WARM-UP-COORDINATION-AGILITY')
Creation of the physical report for the MATCH-MSG training drill started at: 2022-02-15 23:03:43.902718 Creation of the physical report for the MATCH-MSG training drill ended at: 2022-02-15 23:03:44.007419 Time taken to create the physical report for the MATCH-MSG training data is: 0.10 seconds. Creation of the physical report for the CROSSING-AND-FINISHING-HSR-SPR training drill started at: 2022-02-15 23:03:44.008334 Creation of the physical report for the CROSSING-AND-FINISHING-HSR-SPR training drill ended at: 2022-02-15 23:03:44.080399 Time taken to create the physical report for the CROSSING-AND-FINISHING-HSR-SPR training data is: 0.07 seconds. Creation of the physical report for the ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY training drill started at: 2022-02-15 23:03:44.080718 Creation of the physical report for the ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY training drill ended at: 2022-02-15 23:03:44.179410 Time taken to create the physical report for the ATTACK-VS-DEFENCE-ATTACK-SUPERIORITY training data is: 0.10 seconds. Creation of the physical report for the FULL-SESSION-MODIFIED training drill started at: 2022-02-15 23:03:44.179681 Creation of the physical report for the FULL-SESSION-MODIFIED training drill ended at: 2022-02-15 23:03:44.307543 Time taken to create the physical report for the FULL-SESSION-MODIFIED training data is: 0.13 seconds. Creation of the physical report for the PASSING-DRILL-PHYSICAL training drill started at: 2022-02-15 23:03:44.307838 Creation of the physical report for the PASSING-DRILL-PHYSICAL training drill ended at: 2022-02-15 23:03:44.385170 Time taken to create the physical report for the PASSING-DRILL-PHYSICAL training data is: 0.08 seconds. Creation of the physical report for the WARM-UP-COORDINATION-AGILITY training drill started at: 2022-02-15 23:03:44.385754 Creation of the physical report for the WARM-UP-COORDINATION-AGILITY training drill ended at: 2022-02-15 23:03:44.460877 Time taken to create the physical report for the WARM-UP-COORDINATION-AGILITY training data is: 0.08 seconds.
# Define a function to generate a bespoke physical summary of all the players for an individual training session
def create_physical_report_per_day(date):
"""
Define a function to generate a bespoke physical summary of all the players for an individual training session
"""
## Read in exported CSV file if exists, if not, download the latest JSON data
if not os.path.exists(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS.csv')):
### Start timer
tic = datetime.datetime.now()
### Print time of engineering of tracking data started
print(f'Creation a single training report for {date} started at: {tic}')
### List all files available
lst_all_files = glob.glob(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}*-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
### Create an empty list to append individual DataFrames
lst_files_to_append =[]
### Iterate through each file in list of all files
for file in lst_all_files:
### Create temporary DataFrame with each file
df_temp = pd.read_csv(file, index_col=None, header=0)
### Append each individual Define each individual file to the empty list (to be concatenated)
lst_files_to_append.append(df_temp)
### Concatenate all the files
df_day_training_report = pd.concat(lst_files_to_append, axis=0, ignore_index=True)
### Save DataFrame
#### Define filename for each combined file to be saved
save_filename = f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS'.replace(' ', '-').replace('(', '').replace(')', '').replace(':', '').replace('.', '').replace('__', '_').upper()
#### Define the filepath to save each combined file
path = os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports')
#### Save the combined file as a CSV
df_day_training_report.to_csv(path + f'/{save_filename}.csv', index=None, header=True)
### Engineer the data
####
df_day_training_report['Date'] = date
### End timer
toc = datetime.datetime.now()
### Print time reading of CSV files end
print(f'Creation a single training report for {date} ended at: {toc}')
### Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken create a single training report for {date} is: {total_time:0.2f} seconds.')
## If CSV file already exists, read in previously saved DataFrame
else:
### Print time reading of CSV files started
print('CSV file already saved to local storage. Reading in file as a pandas DataFrame.')
### Read in raw DataFrame
df_day_training_report = pd.read_csv(os.path.join(data_dir_physical, 'engineered', 'Set 2', '5_physical_reports', f'{date}-ALL-TRAINING-SESSIONS-PHYSICAL-REPORT-ALL-PLAYERS.csv'))
## Return DataFrame
return df_day_training_report
df_training_report_02022022 = create_physical_report_per_day(date='2022-02-02')
Creation a single training report for 2022-02-02 started at: 2022-02-15 23:03:50.916918 Creation a single training report for 2022-02-02 ended at: 2022-02-15 23:03:50.939939 Time taken create a single training report for 2022-02-02 is: 0.02 seconds.
df_training_report_02022022.head(10)
Player | Date | Training Drill | Minutes Trained | Distance [km] | Low-Speed Activities (LSA) [km] | Moderate-Speed Running (MSR) [km] | High-Speed Running (HSR) [km] | Sprinting [km] | No. Sprints | Top Speed [m/s] | Top Acceleration [m/s/s] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Ngakia | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.047011 | 0.041150 | 0.003905 | 0.001956 | 0.0 | 0 | 6.797228 | 6.011910 |
1 | Masina | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.043365 | 0.038872 | 0.002948 | 0.001545 | 0.0 | 0 | 6.480561 | 5.611116 |
2 | Etebo | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.044917 | 0.040510 | 0.002968 | 0.001439 | 0.0 | 0 | 6.141672 | 5.416671 |
3 | Samir | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.044136 | 0.040443 | 0.002799 | 0.000894 | 0.0 | 0 | 5.597227 | 6.210322 |
4 | Kayembe | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.042606 | 0.039897 | 0.002596 | 0.000114 | 0.0 | 0 | 6.291672 | 4.992067 |
5 | Kalu | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.043508 | 0.039836 | 0.002241 | 0.001432 | 0.0 | 0 | 5.888894 | 4.976194 |
6 | Joao Pedro | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.046932 | 0.042358 | 0.003036 | 0.001538 | 0.0 | 0 | 6.913894 | 5.519846 |
7 | Kabasele | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.046264 | 0.042153 | 0.002750 | 0.001361 | 0.0 | 0 | 6.444450 | 5.480163 |
8 | Sierralta | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.044885 | 0.040096 | 0.003365 | 0.001424 | 0.0 | 0 | 5.866671 | 6.813498 |
9 | King | 2022-02-02 | WARM-UP-COORDINATION-AGILITY | 5.306667 | 0.046201 | 0.042388 | 0.002612 | 0.001202 | 0.0 | 0 | 6.011116 | 4.309527 |
df_training_report_02022022.shape
(133, 12)
This notebook engineer physical data using pandas to create a series of training reports for players, determining metrics include distance covered, total sprints, top speeds, amoungst other breakdowns.
The next stage is to visualise this data in Tableau and analyse the findings, to be presented in a deck.
LaurieOnTracking
*Visit my website eddwebster.com or my GitHub Repository for more projects. If you'd like to get in contact, my Twitter handle is @eddwebster and my email is: edd.j.webster@gmail.com.*