Notebook first written: 03/08/2021
Notebook last updated: 03/11/2021
Click here to jump straight to the Exploratory Data Analysis section and skip the Task Brief, Data Sources, and Data Engineering sections. Or click here to jump straight to the Conclusion.
This notebook joins datasets scraped from FBref provided by StatsBomb, TransferMarkt estimated player values and recorded transfer datasets, and player salaries dataset from Capology, through the record-linkage library, to create one, unified source of information, that can be used for for further analysis of players performance statistics and financial valuations.
For more information about this notebook and the author, I'm available through all the following channels:
The accompanying GitHub repository for this notebook can be found here and a static version of this notebook can be found here.
This notebook was written using Python 3 and requires the following libraries:
Jupyter notebooks
for this notebook environment with which this project is presented;NumPy
for multidimensional array computing;pandas
for data analysis and manipulation; andrecord-linkage
for joining of fuzzy datasets.All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the Conda distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv
# Import Dependencies
%matplotlib inline
# Math Operations
import numpy as np
from math import pi
# Datetime
import datetime
from datetime import date
import time
# Data Preprocessing
import pandas as pd
#import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path
# Reading Directories
import glob
import os
# Working with JSON
import json
from pandas.io.json import json_normalize
# Web Scraping
import requests
from bs4 import BeautifulSoup
import re
# Fuzzy Matching - Record Linkage
import recordlinkage
import jellyfish
import numexpr as ne
# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')
import missingno as msno
# Progress Bar
from tqdm import tqdm
# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML
# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")
print("Setup Complete")
Setup Complete
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
Python: 3.7.6 NumPy: 1.20.3 pandas: 1.3.2 matplotlib: 3.4.2
# Define today's date
today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')
# Define seasons
dict_seasons = {'2016-2017': '2016/2017',
'2017-2018': '2017/2018',
'2018-2019': '2018/2019',
'2019-2020': '2019/2020',
'2020-2021': '2020/2021',
'2021-2022': '2021/2022'
}
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_fbref = os.path.join(base_dir, 'data', 'fbref')
data_dir_tm = os.path.join(base_dir, 'data', 'tm')
data_dir_capology = os.path.join(base_dir, 'data', 'capology')
data_dir_guardian = os.path.join(base_dir, 'data', 'guardian')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
video_dir = os.path.join(base_dir, 'video')
pd.set_option('display.max_columns', None)
This Jupyter notebook is part of a series of notebooks, to scrape, parse, engineer, and unify datasets, that can be used for modeling purposes.
This particular notebook is the data unification notebook, that joins data from FBref (provided by StatsBomb), TransferMarkt, and Capology, using [RecordLinkage].
This notebook, along with the other notebooks in this project workflow are shown in the following diagram:
Links to these notebooks in the football_analytics
GitHub repository can be found at the following:
Notebook Conventions:
DataFrame
object are prefixed with df_
.DataFrame
objects (e.g., a list, a set or a dict) are prefixed with dfs_
.The following cells read in the CSV
files as a pandas DataFrame
s
# Import data as a pandas DataFrames
## FBref-TransferMarkt Player Mapping (including page URLs and positions) by Jason Ziv and rahul Iyer
### Define file location
file = data_dir + '/reference/player_mapping/fbref_to_tm_mapping_latest.csv'
###
with open(file, 'rb') as rawdata:
result = chardet.detect(rawdata.read(100000))
### Read in dataset
df_fbref_tm_urls = pd.read_csv(file, encoding='ISO-8859-1')
## FBref Player Performance data
df_fbref_players = pd.read_csv(data_dir_fbref + '/engineered/outfield-goalkeeper-combined/fbref_outfield_player_goalkeeper_stats_combined_latest.csv')
## TransferMarkt Player Bio-Status data
df_tm_bio_status = pd.read_csv(data_dir_tm + '/engineered/bio-status/tm_player_bio_status_all_1617-2122_latest.csv')
## TransferMarkt Player Historical Market Values data
df_tm_valuations = pd.read_csv(data_dir_tm + '/engineered/historical_market_values/tm_player_valuations_all_1617-2122_latest.csv')
## TransferMarkt Player Recorded Transfer History data
df_tm_transfers = pd.read_csv(data_dir_tm + '/engineered/transfer_history/tm_player_transfer_history_latest.csv')
## Capology Player Salary data
df_capology = pd.read_csv(data_dir_capology + '/engineered/capology_big5_mls_latest.csv')
df_fbref_tm_urls.head()
PlayerFBref | UrlFBref | UrlTmarkt | TmPos | |
---|---|---|---|---|
0 | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward |
1 | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back |
2 | Aarón Escandell | https://fbref.com/en/players/67669ce7/Aaron-Es... | https://www.transfermarkt.com/aaron-escandell/... | Goalkeeper |
3 | Aaron Herzog | https://fbref.com/en/players/565c3fe4/Aaron-He... | https://www.transfermarkt.com/aaron-herzog/pro... | Attacking Midfield |
4 | Aaron Hickey | https://fbref.com/en/players/1780bb4a/Aaron-Hi... | https://www.transfermarkt.com/aaron-hickey/pro... | Left-Back |
df_fbref_tm_urls.shape
(6300, 4)
df_fbref_players.head()
Player | Nation | Pos | Squad | Comp | Age | Born | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | League Name | League ID | Season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | Outfielder Goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aaron Cresswell | eng ENG | DF | West Ham | Premier League | 27 | 1989.0 | 36 | 35 | 3069.0 | 34.1 | 1 | 3 | 1 | 0 | 0 | 7 | 0 | 0.03 | 0.09 | 0.12 | 0.03 | 0.12 | 0.8 | 0.8 | 2.8 | 3.6 | 0.02 | 0.08 | 0.10 | 0.02 | 0.10 | Matches | 21.0 | 6.0 | 28.6 | 0.62 | 0.18 | 0.05 | 0.17 | 28.1 | 8.0 | 0.04 | 0.2 | 0.2 | 1224.0 | 1708.0 | 71.7 | 23519.0 | 10212.0 | 560.0 | 623.0 | 89.9 | 472.0 | 587.0 | 80.4 | 183.0 | 449.0 | 40.8 | 0.2 | 35.0 | 117.0 | 21.0 | 14.0 | 96.0 | 1343.0 | 365.0 | 1.0 | 222.0 | 83.0 | 93.0 | 67.0 | 35.0 | 15.0 | 9.0 | 893.0 | 293.0 | 522.0 | 1329.0 | 78.0 | 59.0 | 210.0 | 5.0 | 15.0 | 44.0 | 39.0 | 52.0 | 62.0 | 1.82 | 35.0 | 21.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.26 | 6.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | 18.0 | 15.0 | 18.0 | 5.0 | 17.0 | 53.1 | 15.0 | 115.0 | 32.1 | 181.0 | 123.0 | 54.0 | 0.0 | 38.0 | 90.0 | 133.0 | 0.0 | 2050.0 | 125.0 | 17.0 | 33.3 | 7.0 | 0.0 | 1071.0 | 2.0 | 18.0 | 19.0 | 1171.0 | 1094.0 | 93.4 | 31.0 | 85 | 89.7 | NaN | 30.0 | 1 | NaN | 1 | 1.14 | 45.0 | 60.0 | -15.0 | -0.44 | 0.84 | 38.0 | 51.5 | -13.5 | -0.40 | 1.09 | 0.0 | 20 | 0.0 | 0.0 | 0.0 | 277.0 | 70.0 | 57.0 | 55.1 | Big-5-European-Leagues | Big5 | 2017-2018 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Aaron Hunt | de GER | MF,FW | Hamburger SV | Bundeliga | 30 | 1986.0 | 28 | 26 | 2081.0 | 23.1 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0.13 | 0.09 | 0.22 | 0.09 | 0.17 | 2.8 | 2.1 | 5.6 | 7.6 | 0.12 | 0.23 | 0.35 | 0.09 | 0.32 | Matches | 27.0 | 6.0 | 22.2 | 1.17 | 0.26 | 0.07 | 0.33 | 23.4 | 10.0 | 0.08 | 0.2 | -0.1 | 883.0 | 1229.0 | 71.8 | 16889.0 | 5315.0 | 406.0 | 480.0 | 84.6 | 292.0 | 376.0 | 77.7 | 165.0 | 303.0 | 54.5 | -3.6 | 65.0 | 83.0 | 31.0 | 5.0 | 97.0 | 977.0 | 252.0 | 11.0 | 245.0 | 67.0 | 66.0 | 123.0 | 35.0 | 41.0 | 14.0 | 672.0 | 236.0 | 321.0 | 999.0 | 137.0 | 42.0 | 23.0 | 9.0 | 5.0 | 29.0 | 29.0 | 49.0 | 102.0 | 4.25 | 54.0 | 43.0 | 1.0 | 2.0 | 1.0 | 6.0 | 0.25 | 5.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 22.0 | 12.0 | 16.0 | 2.0 | 5.0 | 13.5 | 32.0 | 135.0 | 27.9 | 102.0 | 261.0 | 121.0 | 0.0 | 28.0 | 44.0 | 21.0 | 0.0 | 1475.0 | 28.0 | 68.0 | 58.3 | 23.0 | 4.0 | 892.0 | 7.0 | 45.0 | 42.0 | 1176.0 | 893.0 | 75.9 | 178.0 | 74 | 68.0 | NaN | 14.0 | 2 | NaN | 0 | 1.07 | 22.0 | 34.0 | -12.0 | -0.52 | 0.58 | 27.0 | 31.3 | -4.3 | -0.18 | 0.94 | 0.0 | 27 | 0.0 | 0.0 | 0.0 | 213.0 | 22.0 | 37.0 | 37.3 | Big-5-European-Leagues | Big5 | 2017-2018 | Hamburger SV | Germany | aaron hunt | aaron | hunt | a | germany | GER | Germany | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Aaron Lennon | eng ENG | MF | Burnley | Premier League | 30 | 1987.0 | 14 | 13 | 1118.0 | 12.4 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 0.00 | 0.16 | 0.16 | 0.00 | 0.16 | 0.6 | 0.6 | 1.4 | 2.0 | 0.05 | 0.11 | 0.16 | 0.05 | 0.16 | Matches | 10.0 | 4.0 | 40.0 | 0.81 | 0.32 | 0.00 | 0.00 | 16.6 | 0.0 | 0.06 | -0.6 | -0.6 | 204.0 | 294.0 | 69.4 | 3223.0 | 887.0 | 116.0 | 142.0 | 81.7 | 68.0 | 92.0 | 73.9 | 17.0 | 34.0 | 50.0 | 0.6 | 8.0 | 11.0 | 13.0 | 5.0 | 22.0 | 289.0 | 5.0 | 0.0 | 61.0 | 5.0 | 19.0 | 0.0 | 0.0 | 0.0 | 0.0 | 193.0 | 51.0 | 50.0 | 27.0 | 250.0 | 7.0 | 4.0 | 3.0 | 0.0 | 9.0 | 8.0 | 30.0 | 18.0 | 1.45 | 12.0 | 0.0 | 1.0 | 1.0 | 0.0 | 3.0 | 0.24 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 18.0 | 10.0 | 6.0 | 11.0 | 1.0 | 4.0 | 19.0 | 17.0 | 61.0 | 26.3 | 74.0 | 102.0 | 56.0 | 0.0 | 24.0 | 31.0 | 9.0 | 0.0 | 424.0 | 19.0 | 36.0 | 48.0 | 12.0 | 2.0 | 290.0 | 12.0 | 9.0 | 25.0 | 353.0 | 259.0 | 73.4 | 41.0 | 80 | 32.7 | NaN | 6.0 | 1 | NaN | 0 | 1.43 | 17.0 | 15.0 | 2.0 | 0.16 | 0.36 | 13.8 | 15.4 | -1.5 | -0.12 | 0.49 | 0.0 | 12 | 0.0 | 0.0 | 0.0 | 80.0 | 7.0 | 15.0 | 31.8 | Big-5-European-Leagues | Big5 | 2017-2018 | Burnley | England | aaron lennon | aaron | lennon | a | england | ENG | England | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Aaron Lennon | eng ENG | FW,MF | Everton | Premier League | 30 | 1987.0 | 15 | 9 | 793.0 | 8.8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.3 | 0.3 | 0.5 | 0.8 | 0.04 | 0.05 | 0.09 | 0.04 | 0.09 | Matches | 4.0 | 1.0 | 25.0 | 0.45 | 0.11 | 0.00 | 0.00 | 14.8 | 0.0 | 0.08 | -0.3 | -0.3 | 152.0 | 214.0 | 71.0 | 2286.0 | 672.0 | 92.0 | 115.0 | 80.0 | 53.0 | 69.0 | 76.8 | 5.0 | 13.0 | 38.5 | -0.5 | 5.0 | 9.0 | 3.0 | 2.0 | 17.0 | 199.0 | 15.0 | 0.0 | 49.0 | 2.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 129.0 | 47.0 | 38.0 | 29.0 | 159.0 | 10.0 | 14.0 | 0.0 | 1.0 | 3.0 | 7.0 | 15.0 | 16.0 | 1.82 | 11.0 | 0.0 | 1.0 | 2.0 | 0.0 | 4.0 | 0.45 | 2.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 18.0 | 10.0 | 9.0 | 7.0 | 2.0 | 5.0 | 25.0 | 15.0 | 38.0 | 19.3 | 49.0 | 102.0 | 46.0 | 0.0 | 18.0 | 25.0 | 9.0 | 0.0 | 322.0 | 7.0 | 22.0 | 35.0 | 8.0 | 1.0 | 186.0 | 8.0 | 9.0 | 17.0 | 288.0 | 195.0 | 67.7 | 33.0 | 53 | 23.2 | NaN | 2.0 | 6 | NaN | 0 | 1.27 | 15.0 | 14.0 | 1.0 | 0.11 | 0.63 | 12.0 | 13.7 | -1.6 | -0.19 | 0.13 | 0.0 | 9 | 2.0 | 0.0 | 0.0 | 50.0 | 6.0 | 12.0 | 33.3 | Big-5-European-Leagues | Big5 | 2017-2018 | Everton | England | aaron lennon | aaron | lennon | a | england | ENG | England | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | Aaron Mooy | au AUS | MF | Huddersfield | Premier League | 26 | 1990.0 | 36 | 34 | 3067.0 | 34.1 | 4 | 3 | 3 | 1 | 1 | 4 | 0 | 0.12 | 0.09 | 0.21 | 0.09 | 0.18 | 2.6 | 1.8 | 3.1 | 4.9 | 0.08 | 0.09 | 0.17 | 0.05 | 0.14 | Matches | 28.0 | 6.0 | 21.4 | 0.82 | 0.18 | 0.11 | 0.50 | 22.0 | 3.0 | 0.06 | 1.4 | 1.2 | 1561.0 | 2067.0 | 75.5 | 27911.0 | 7921.0 | 783.0 | 876.0 | 89.4 | 540.0 | 678.0 | 79.6 | 196.0 | 397.0 | 49.4 | -0.1 | 48.0 | 167.0 | 27.0 | 9.0 | 163.0 | 1897.0 | 170.0 | 1.0 | 422.0 | 100.0 | 85.0 | 77.0 | 35.0 | 21.0 | 5.0 | 1293.0 | 283.0 | 491.0 | 507.0 | 1444.0 | 77.0 | 5.0 | 4.0 | 6.0 | 38.0 | 60.0 | 60.0 | 73.0 | 2.14 | 54.0 | 16.0 | 0.0 | 1.0 | 2.0 | 5.0 | 0.15 | 4.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 105.0 | 55.0 | 38.0 | 54.0 | 13.0 | 32.0 | 44.4 | 40.0 | 193.0 | 29.5 | 192.0 | 355.0 | 107.0 | 2.0 | 52.0 | 151.0 | 70.0 | 0.0 | 2496.0 | 65.0 | 32.0 | 53.2 | 26.0 | 0.0 | 1543.0 | 6.0 | 33.0 | 60.0 | 1710.0 | 1540.0 | 90.1 | 85.0 | 85 | 89.7 | NaN | 29.0 | 2 | NaN | 0 | 0.94 | 25.0 | 52.0 | -27.0 | -0.79 | -0.03 | 28.7 | 49.8 | -21.1 | -0.62 | -0.01 | 0.0 | 26 | 0.0 | 0.0 | 0.0 | 455.0 | 35.0 | 42.0 | 45.5 | Big-5-European-Leagues | Big5 | 2017-2018 | Huddersfield | England | aaron mooy | aaron | mooy | a | england | AUS | Australia | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df_fbref_players.shape
(13680, 205)
Player bio and status data
df_tm_bio_status.head()
tm_id | player_name | birth_day | birth_month | birth_year | pob | cob | dob | position | height | foot | citizenship | second_citizenship | league_code | season | current_club | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | player_agent | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | position_code | position_grouped | outfielder_goalkeeper | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2857 | eldin jakupovic | 2.0 | 10.0 | 1984.0 | Kozarac | Jugoslawien (SFR) | 1984-10-02 | Goalkeeper | 191.0 | right | NaN | Bosnia-Herzegovina | GB1 | 2021 | leicester city | england | 300000.0 | 2017-07-19 | 2021-06-30 | NaN | NaN | NaN | NaN | HSD | eldin jakupovic | eldin | jakupovic | e | england | GK | Goalkeeper | Goalkeeper | 36.0 | 32.0 | 4.0 | -1.0 | 270000.0 |
1 | 3333 | james milner | 4.0 | 1.0 | 1986.0 | Leeds | England | 1986-01-04 | midfield - Central Midfield | 175.0 | right | England | NaN | GB1 | 2021 | liverpool fc | england | 3000000.0 | 2015-07-01 | 2022-06-30 | NaN | NaN | NaN | NaN | Samii Sport-Marketing Agentur | james milner | james | milner | j | england | CM | Midfielder | Outfielder | 35.0 | 29.0 | 6.0 | 0.0 | 2700000.0 |
2 | 3455 | zlatan ibrahimovic | 3.0 | 10.0 | 1981.0 | Malmö | Sweden | 1981-10-03 | attack - Centre-Forward | 195.0 | both | NaN | Bosnia-Herzegovina | IT1 | 2021 | ac milan | italy | 4000000.0 | 2020-01-02 | 2022-06-30 | NaN | NaN | NaN | NaN | Mino Raiola | zlatan ibrahimovic | zlatan | ibrahimovic | z | italy | ST | Forward | Outfielder | 39.0 | 38.0 | 1.0 | 0.0 | 3600000.0 |
3 | 5578 | nicolas penneteau | 28.0 | 2.0 | 1981.0 | Marseille | France | 1981-02-28 | Goalkeeper | 185.0 | left | France | NaN | FR1 | 2021 | stade reims | france | 200000.0 | 2021-07-01 | 2023-06-30 | NaN | NaN | NaN | NaN | USM GROUP | nicolas penneteau | nicolas | penneteau | n | france | GK | Goalkeeper | Goalkeeper | 40.0 | 40.0 | 0.0 | 1.0 | 180000.0 |
4 | 6442 | antonio rosati | 26.0 | 6.0 | 1983.0 | Tivoli | Italy | 1983-06-26 | Goalkeeper | 195.0 | right | Italy | NaN | IT1 | 2021 | acf fiorentina | italy | 100000.0 | 2021-02-01 | NaN | NaN | NaN | NaN | NaN | Alessandro Lucci - WSA | antonio rosati | antonio | rosati | a | italy | GK | Goalkeeper | Goalkeeper | 38.0 | 37.0 | 0.0 | NaN | 90000.0 |
df_tm_bio_status.shape
(9429, 38)
Historical player valuation data
df_tm_valuations.head()
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | 2004/2005 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 1800000.0 | 2000000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer |
1 | 26 | 2005/2006 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 6075000.0 | 6750000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer |
2 | 26 | 2006/2007 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 6750000.0 | 7500000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer |
3 | 26 | 2007/2008 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 7200000.0 | 8000000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer |
4 | 26 | 2008/2009 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 4500000.0 | 5000000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer |
df_tm_valuations.shape
(67236, 21)
Player recorded transfer data
df_tm_transfers.head()
club_name | player_name | age | position | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | year | season | league_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | VfB Stuttgart | Adrian Knup | 23.0 | Centre-Forward | FC Luzern | ? | in | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 |
1 | 1. FC Köln | Adrian Spyrka | 24.0 | Central Midfield | Stuttg. Kickers | End of loanJun 30, 1992 | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 |
2 | Karlsruher SC | Alexander Famulla | 31.0 | Goalkeeper | FC 08 Homburg | ? | out | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 |
3 | SV Werder Bremen | Alexander Malchow | 22.0 | Centre-Back | VfB Oldenburg | Free transfer | out | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 |
4 | SG Dynamo Dresden | Alexander Zickler | 18.0 | Centre-Forward | D. Dresden U19 | - | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 |
df_tm_transfers.shape
(169208, 13)
Player salaries
df_capology.head()
player | season | league | team | position | outfielder_goalkeeper | age | country | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albian Ajeti | 2016-2017 | Bundesliga | Augsburg | Forward | Outfielder | 19 | Switzerland | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN |
1 | Alexander Esswein | 2016-2017 | Bundesliga | Augsburg | Forward | Outfielder | 26 | Germany | 12919.0 | 671795.0 | 696824.0 | NaN | NaN | NaN | NaN |
2 | Alfred Finnbogason | 2016-2017 | Bundesliga | Augsburg | Forward | Outfielder | 27 | Iceland | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN |
3 | Andreas Luthe | 2016-2017 | Bundesliga | Augsburg | Goalkeeper | Goalkeeper | 29 | Germany | 5939.0 | 308881.0 | 320389.0 | NaN | NaN | NaN | NaN |
4 | Caiuby | 2016-2017 | Bundesliga | Augsburg | Forward | Outfielder | 27 | Brazil | 12919.0 | 671795.0 | 696824.0 | NaN | NaN | NaN | NaN |
df_capology.shape
(21281, 15)
Last six digits of TransferMarkt URL: https://www.transfermarkt.com/jack-grealish/profil/spieler/203460.
df_fbref_tm_urls['tm_id'] = df_fbref_tm_urls['UrlTmarkt'].str.rsplit('/', n=1).str.get(-1)
Penultimate eight digits of FBref URL: https://fbref.com/en/players/b0b4fd3e/Jack-Grealish
df_fbref_tm_urls['fbref_id'] = df_fbref_tm_urls['UrlFBref'].str.rsplit('/', n=1).str.get(-2)
df_fbref_tm_urls['fbref_id'] = df_fbref_tm_urls['fbref_id'].str.rsplit('/', n=1).str.get(-1)
## Rename columns
df_fbref_tm_urls = (df_fbref_tm_urls
.rename(columns={'PlayerFBref': 'player_name_fbref',
'UrlFBref': 'url_fbref',
'UrlTmarkt': 'url_tm'
}
)
)
df_fbref_tm_urls.head()
player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | |
---|---|---|---|---|---|---|
0 | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
1 | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 |
2 | Aarón Escandell | https://fbref.com/en/players/67669ce7/Aaron-Es... | https://www.transfermarkt.com/aaron-escandell/... | Goalkeeper | 284430 | 67669ce7 |
3 | Aaron Herzog | https://fbref.com/en/players/565c3fe4/Aaron-He... | https://www.transfermarkt.com/aaron-herzog/pro... | Attacking Midfield | 276566 | 565c3fe4 |
4 | Aaron Hickey | https://fbref.com/en/players/1780bb4a/Aaron-Hi... | https://www.transfermarkt.com/aaron-hickey/pro... | Left-Back | 591949 | 1780bb4a |
lst_leagues_fbref_players = list(df_fbref_players['Team Country'].unique())
lst_leagues_fbref_players
['England', 'Germany', 'Spain', 'France', 'Italy']
lst_leagues_fbref_players_big5 = ['England', 'Germany', 'Spain', 'France', 'Italy']
df_fbref_players = df_fbref_players[df_fbref_players['Team Country'].isin(lst_leagues_fbref_players_big5)]
# Map season to DataFrame
df_fbref_players['Season'] = df_fbref_players['Season'].map(dict_seasons)
lst_seasons_fbref_players = list(df_fbref_players['Season'].unique())
lst_seasons_fbref_players
['2017/2018', '2018/2019', '2019/2020', '2020/2021', '2021/2022']
lst_seasons_fbref_players = ['2017/2018', '2018/2019', '2019/2020', '2020/2021', '2021/2022']
df_fbref_players = df_fbref_players[df_fbref_players['Season'].isin(lst_seasons_fbref_players)]
# Remove accents and create lowercase name
df_fbref_players['player_name_lower'] = (df_fbref_players['Player']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# First Name Lower
df_fbref_players['first_name_lower'] = df_fbref_players['player_name_lower'].str.rsplit(' ', 0).str[0]
# Last Name Lower
df_fbref_players['last_name_lower'] = df_fbref_players['player_name_lower'].str.rsplit(' ', 1).str[-1]
# First Initial Lower
df_fbref_players['first_initial_lower'] = df_fbref_players['player_name_lower'].astype(str).str[0]
# Remove accents and create lowercase name
df_fbref_players['country_lower'] = (df_fbref_players['Nationality Cleaned']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# Display DataFrame
df_fbref_players.head()
Player | Nation | Pos | Squad | Comp | Age | Born | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | League Name | League ID | Season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | Outfielder Goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aaron Cresswell | eng ENG | DF | West Ham | Premier League | 27 | 1989.0 | 36 | 35 | 3069.0 | 34.1 | 1 | 3 | 1 | 0 | 0 | 7 | 0 | 0.03 | 0.09 | 0.12 | 0.03 | 0.12 | 0.8 | 0.8 | 2.8 | 3.6 | 0.02 | 0.08 | 0.10 | 0.02 | 0.10 | Matches | 21.0 | 6.0 | 28.6 | 0.62 | 0.18 | 0.05 | 0.17 | 28.1 | 8.0 | 0.04 | 0.2 | 0.2 | 1224.0 | 1708.0 | 71.7 | 23519.0 | 10212.0 | 560.0 | 623.0 | 89.9 | 472.0 | 587.0 | 80.4 | 183.0 | 449.0 | 40.8 | 0.2 | 35.0 | 117.0 | 21.0 | 14.0 | 96.0 | 1343.0 | 365.0 | 1.0 | 222.0 | 83.0 | 93.0 | 67.0 | 35.0 | 15.0 | 9.0 | 893.0 | 293.0 | 522.0 | 1329.0 | 78.0 | 59.0 | 210.0 | 5.0 | 15.0 | 44.0 | 39.0 | 52.0 | 62.0 | 1.82 | 35.0 | 21.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.26 | 6.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | 18.0 | 15.0 | 18.0 | 5.0 | 17.0 | 53.1 | 15.0 | 115.0 | 32.1 | 181.0 | 123.0 | 54.0 | 0.0 | 38.0 | 90.0 | 133.0 | 0.0 | 2050.0 | 125.0 | 17.0 | 33.3 | 7.0 | 0.0 | 1071.0 | 2.0 | 18.0 | 19.0 | 1171.0 | 1094.0 | 93.4 | 31.0 | 85 | 89.7 | NaN | 30.0 | 1 | NaN | 1 | 1.14 | 45.0 | 60.0 | -15.0 | -0.44 | 0.84 | 38.0 | 51.5 | -13.5 | -0.40 | 1.09 | 0.0 | 20 | 0.0 | 0.0 | 0.0 | 277.0 | 70.0 | 57.0 | 55.1 | Big-5-European-Leagues | Big5 | 2017/2018 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england |
1 | Aaron Hunt | de GER | MF,FW | Hamburger SV | Bundeliga | 30 | 1986.0 | 28 | 26 | 2081.0 | 23.1 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0.13 | 0.09 | 0.22 | 0.09 | 0.17 | 2.8 | 2.1 | 5.6 | 7.6 | 0.12 | 0.23 | 0.35 | 0.09 | 0.32 | Matches | 27.0 | 6.0 | 22.2 | 1.17 | 0.26 | 0.07 | 0.33 | 23.4 | 10.0 | 0.08 | 0.2 | -0.1 | 883.0 | 1229.0 | 71.8 | 16889.0 | 5315.0 | 406.0 | 480.0 | 84.6 | 292.0 | 376.0 | 77.7 | 165.0 | 303.0 | 54.5 | -3.6 | 65.0 | 83.0 | 31.0 | 5.0 | 97.0 | 977.0 | 252.0 | 11.0 | 245.0 | 67.0 | 66.0 | 123.0 | 35.0 | 41.0 | 14.0 | 672.0 | 236.0 | 321.0 | 999.0 | 137.0 | 42.0 | 23.0 | 9.0 | 5.0 | 29.0 | 29.0 | 49.0 | 102.0 | 4.25 | 54.0 | 43.0 | 1.0 | 2.0 | 1.0 | 6.0 | 0.25 | 5.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 22.0 | 12.0 | 16.0 | 2.0 | 5.0 | 13.5 | 32.0 | 135.0 | 27.9 | 102.0 | 261.0 | 121.0 | 0.0 | 28.0 | 44.0 | 21.0 | 0.0 | 1475.0 | 28.0 | 68.0 | 58.3 | 23.0 | 4.0 | 892.0 | 7.0 | 45.0 | 42.0 | 1176.0 | 893.0 | 75.9 | 178.0 | 74 | 68.0 | NaN | 14.0 | 2 | NaN | 0 | 1.07 | 22.0 | 34.0 | -12.0 | -0.52 | 0.58 | 27.0 | 31.3 | -4.3 | -0.18 | 0.94 | 0.0 | 27 | 0.0 | 0.0 | 0.0 | 213.0 | 22.0 | 37.0 | 37.3 | Big-5-European-Leagues | Big5 | 2017/2018 | Hamburger SV | Germany | aaron hunt | aaron | hunt | a | germany | GER | Germany | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron hunt | aaron | hunt | a | germany |
2 | Aaron Lennon | eng ENG | MF | Burnley | Premier League | 30 | 1987.0 | 14 | 13 | 1118.0 | 12.4 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 0.00 | 0.16 | 0.16 | 0.00 | 0.16 | 0.6 | 0.6 | 1.4 | 2.0 | 0.05 | 0.11 | 0.16 | 0.05 | 0.16 | Matches | 10.0 | 4.0 | 40.0 | 0.81 | 0.32 | 0.00 | 0.00 | 16.6 | 0.0 | 0.06 | -0.6 | -0.6 | 204.0 | 294.0 | 69.4 | 3223.0 | 887.0 | 116.0 | 142.0 | 81.7 | 68.0 | 92.0 | 73.9 | 17.0 | 34.0 | 50.0 | 0.6 | 8.0 | 11.0 | 13.0 | 5.0 | 22.0 | 289.0 | 5.0 | 0.0 | 61.0 | 5.0 | 19.0 | 0.0 | 0.0 | 0.0 | 0.0 | 193.0 | 51.0 | 50.0 | 27.0 | 250.0 | 7.0 | 4.0 | 3.0 | 0.0 | 9.0 | 8.0 | 30.0 | 18.0 | 1.45 | 12.0 | 0.0 | 1.0 | 1.0 | 0.0 | 3.0 | 0.24 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 18.0 | 10.0 | 6.0 | 11.0 | 1.0 | 4.0 | 19.0 | 17.0 | 61.0 | 26.3 | 74.0 | 102.0 | 56.0 | 0.0 | 24.0 | 31.0 | 9.0 | 0.0 | 424.0 | 19.0 | 36.0 | 48.0 | 12.0 | 2.0 | 290.0 | 12.0 | 9.0 | 25.0 | 353.0 | 259.0 | 73.4 | 41.0 | 80 | 32.7 | NaN | 6.0 | 1 | NaN | 0 | 1.43 | 17.0 | 15.0 | 2.0 | 0.16 | 0.36 | 13.8 | 15.4 | -1.5 | -0.12 | 0.49 | 0.0 | 12 | 0.0 | 0.0 | 0.0 | 80.0 | 7.0 | 15.0 | 31.8 | Big-5-European-Leagues | Big5 | 2017/2018 | Burnley | England | aaron lennon | aaron | lennon | a | england | ENG | England | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron lennon | aaron | lennon | a | england |
3 | Aaron Lennon | eng ENG | FW,MF | Everton | Premier League | 30 | 1987.0 | 15 | 9 | 793.0 | 8.8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.3 | 0.3 | 0.5 | 0.8 | 0.04 | 0.05 | 0.09 | 0.04 | 0.09 | Matches | 4.0 | 1.0 | 25.0 | 0.45 | 0.11 | 0.00 | 0.00 | 14.8 | 0.0 | 0.08 | -0.3 | -0.3 | 152.0 | 214.0 | 71.0 | 2286.0 | 672.0 | 92.0 | 115.0 | 80.0 | 53.0 | 69.0 | 76.8 | 5.0 | 13.0 | 38.5 | -0.5 | 5.0 | 9.0 | 3.0 | 2.0 | 17.0 | 199.0 | 15.0 | 0.0 | 49.0 | 2.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 129.0 | 47.0 | 38.0 | 29.0 | 159.0 | 10.0 | 14.0 | 0.0 | 1.0 | 3.0 | 7.0 | 15.0 | 16.0 | 1.82 | 11.0 | 0.0 | 1.0 | 2.0 | 0.0 | 4.0 | 0.45 | 2.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 18.0 | 10.0 | 9.0 | 7.0 | 2.0 | 5.0 | 25.0 | 15.0 | 38.0 | 19.3 | 49.0 | 102.0 | 46.0 | 0.0 | 18.0 | 25.0 | 9.0 | 0.0 | 322.0 | 7.0 | 22.0 | 35.0 | 8.0 | 1.0 | 186.0 | 8.0 | 9.0 | 17.0 | 288.0 | 195.0 | 67.7 | 33.0 | 53 | 23.2 | NaN | 2.0 | 6 | NaN | 0 | 1.27 | 15.0 | 14.0 | 1.0 | 0.11 | 0.63 | 12.0 | 13.7 | -1.6 | -0.19 | 0.13 | 0.0 | 9 | 2.0 | 0.0 | 0.0 | 50.0 | 6.0 | 12.0 | 33.3 | Big-5-European-Leagues | Big5 | 2017/2018 | Everton | England | aaron lennon | aaron | lennon | a | england | ENG | England | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron lennon | aaron | lennon | a | england |
4 | Aaron Mooy | au AUS | MF | Huddersfield | Premier League | 26 | 1990.0 | 36 | 34 | 3067.0 | 34.1 | 4 | 3 | 3 | 1 | 1 | 4 | 0 | 0.12 | 0.09 | 0.21 | 0.09 | 0.18 | 2.6 | 1.8 | 3.1 | 4.9 | 0.08 | 0.09 | 0.17 | 0.05 | 0.14 | Matches | 28.0 | 6.0 | 21.4 | 0.82 | 0.18 | 0.11 | 0.50 | 22.0 | 3.0 | 0.06 | 1.4 | 1.2 | 1561.0 | 2067.0 | 75.5 | 27911.0 | 7921.0 | 783.0 | 876.0 | 89.4 | 540.0 | 678.0 | 79.6 | 196.0 | 397.0 | 49.4 | -0.1 | 48.0 | 167.0 | 27.0 | 9.0 | 163.0 | 1897.0 | 170.0 | 1.0 | 422.0 | 100.0 | 85.0 | 77.0 | 35.0 | 21.0 | 5.0 | 1293.0 | 283.0 | 491.0 | 507.0 | 1444.0 | 77.0 | 5.0 | 4.0 | 6.0 | 38.0 | 60.0 | 60.0 | 73.0 | 2.14 | 54.0 | 16.0 | 0.0 | 1.0 | 2.0 | 5.0 | 0.15 | 4.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 105.0 | 55.0 | 38.0 | 54.0 | 13.0 | 32.0 | 44.4 | 40.0 | 193.0 | 29.5 | 192.0 | 355.0 | 107.0 | 2.0 | 52.0 | 151.0 | 70.0 | 0.0 | 2496.0 | 65.0 | 32.0 | 53.2 | 26.0 | 0.0 | 1543.0 | 6.0 | 33.0 | 60.0 | 1710.0 | 1540.0 | 90.1 | 85.0 | 85 | 89.7 | NaN | 29.0 | 2 | NaN | 0 | 0.94 | 25.0 | 52.0 | -27.0 | -0.79 | -0.03 | 28.7 | 49.8 | -21.1 | -0.62 | -0.01 | 0.0 | 26 | 0.0 | 0.0 | 0.0 | 455.0 | 35.0 | 42.0 | 45.5 | Big-5-European-Leagues | Big5 | 2017/2018 | Huddersfield | England | aaron mooy | aaron | mooy | a | england | AUS | Australia | MF | Midfielder | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron mooy | aaron | mooy | a | australia |
## Rename columns
df_fbref_players = (df_fbref_players
.rename(columns={'Born': 'birth_year',
'Outfielder Goalkeeper': 'outfielder_goalkeeper',
'Season': 'season',
}
)
)
## Define columns
cols_fbref_players = ['Player',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
#'age',
'birth_year',
'country_lower',
'outfielder_goalkeeper',
'season'
]
## Select columns of interest
df_fbref_players_select = df_fbref_players[cols_fbref_players]
# Drop duplicates
df_fbref_players_select = df_fbref_players_select.drop_duplicates()
# Display DataFrame
df_fbref_players_select.head()
Player | first_initial_lower | first_name_lower | last_name_lower | birth_year | country_lower | outfielder_goalkeeper | season | |
---|---|---|---|---|---|---|---|---|
0 | Aaron Cresswell | a | aaron | cresswell | 1989.0 | england | Outfielder | 2017/2018 |
1 | Aaron Hunt | a | aaron | hunt | 1986.0 | germany | Outfielder | 2017/2018 |
2 | Aaron Lennon | a | aaron | lennon | 1987.0 | england | Outfielder | 2017/2018 |
4 | Aaron Mooy | a | aaron | mooy | 1990.0 | australia | Outfielder | 2017/2018 |
5 | Aaron Ramsey | a | aaron | ramsey | 1990.0 | wales | Outfielder | 2017/2018 |
lst_leagues_tm_bio_status = list(df_tm_bio_status['league_code'].unique())
lst_leagues_tm_bio_status
['GB1', 'IT1', 'FR1', 'L1', 'ES1', 'MLS1']
lst_leagues_tm_bio_status_big5 = ['GB1', 'IT1', 'FR1', 'L1', 'ES1']
df_tm_bio_status = df_tm_bio_status[df_tm_bio_status['league_code'].isin(lst_leagues_tm_bio_status_big5)]
# Remove accents and create lowercase name
df_tm_bio_status['player_name_lower'] = (df_tm_bio_status['player_name']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# First Name Lower
df_tm_bio_status['first_name_lower'] = df_tm_bio_status['player_name_lower'].str.rsplit(' ', 0).str[0]
# Last Name Lower
df_tm_bio_status['last_name_lower'] = df_tm_bio_status['player_name_lower'].str.rsplit(' ', 1).str[-1]
# First Initial Lower
df_tm_bio_status['first_initial_lower'] = df_tm_bio_status['player_name_lower'].astype(str).str[0]
# Remove accents and create lowercase name
df_tm_bio_status['country_lower'] = (df_tm_bio_status['cob']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# Display DataFrame
df_tm_bio_status.head()
tm_id | player_name | birth_day | birth_month | birth_year | pob | cob | dob | position | height | foot | citizenship | second_citizenship | league_code | season | current_club | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | player_agent | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | position_code | position_grouped | outfielder_goalkeeper | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2857 | eldin jakupovic | 2.0 | 10.0 | 1984.0 | Kozarac | Jugoslawien (SFR) | 1984-10-02 | Goalkeeper | 191.0 | right | NaN | Bosnia-Herzegovina | GB1 | 2021 | leicester city | england | 300000.0 | 2017-07-19 | 2021-06-30 | NaN | NaN | NaN | NaN | HSD | eldin jakupovic | eldin | jakupovic | e | england | GK | Goalkeeper | Goalkeeper | 36.0 | 32.0 | 4.0 | -1.0 | 270000.0 | eldin jakupovic | eldin | jakupovic | e | jugoslawien (sfr) |
1 | 3333 | james milner | 4.0 | 1.0 | 1986.0 | Leeds | England | 1986-01-04 | midfield - Central Midfield | 175.0 | right | England | NaN | GB1 | 2021 | liverpool fc | england | 3000000.0 | 2015-07-01 | 2022-06-30 | NaN | NaN | NaN | NaN | Samii Sport-Marketing Agentur | james milner | james | milner | j | england | CM | Midfielder | Outfielder | 35.0 | 29.0 | 6.0 | 0.0 | 2700000.0 | james milner | james | milner | j | england |
2 | 3455 | zlatan ibrahimovic | 3.0 | 10.0 | 1981.0 | Malmö | Sweden | 1981-10-03 | attack - Centre-Forward | 195.0 | both | NaN | Bosnia-Herzegovina | IT1 | 2021 | ac milan | italy | 4000000.0 | 2020-01-02 | 2022-06-30 | NaN | NaN | NaN | NaN | Mino Raiola | zlatan ibrahimovic | zlatan | ibrahimovic | z | italy | ST | Forward | Outfielder | 39.0 | 38.0 | 1.0 | 0.0 | 3600000.0 | zlatan ibrahimovic | zlatan | ibrahimovic | z | sweden |
3 | 5578 | nicolas penneteau | 28.0 | 2.0 | 1981.0 | Marseille | France | 1981-02-28 | Goalkeeper | 185.0 | left | France | NaN | FR1 | 2021 | stade reims | france | 200000.0 | 2021-07-01 | 2023-06-30 | NaN | NaN | NaN | NaN | USM GROUP | nicolas penneteau | nicolas | penneteau | n | france | GK | Goalkeeper | Goalkeeper | 40.0 | 40.0 | 0.0 | 1.0 | 180000.0 | nicolas penneteau | nicolas | penneteau | n | france |
4 | 6442 | antonio rosati | 26.0 | 6.0 | 1983.0 | Tivoli | Italy | 1983-06-26 | Goalkeeper | 195.0 | right | Italy | NaN | IT1 | 2021 | acf fiorentina | italy | 100000.0 | 2021-02-01 | NaN | NaN | NaN | NaN | NaN | Alessandro Lucci - WSA | antonio rosati | antonio | rosati | a | italy | GK | Goalkeeper | Goalkeeper | 38.0 | 37.0 | 0.0 | NaN | 90000.0 | antonio rosati | antonio | rosati | a | italy |
# Define columns
cols_tm_bio_status = ['player_name',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
#'age',
'birth_year',
'country_lower',
'outfielder_goalkeeper',
'tm_id'
]
# Select columns of interest
df_tm_bio_status_select = df_tm_bio_status[cols_tm_bio_status]
# Drop duplicates
df_tm_bio_status_select = df_tm_bio_status_select.drop_duplicates()
# Display DataFrame
df_tm_bio_status_select.head()
player_name | first_initial_lower | first_name_lower | last_name_lower | birth_year | country_lower | outfielder_goalkeeper | tm_id | |
---|---|---|---|---|---|---|---|---|
0 | eldin jakupovic | e | eldin | jakupovic | 1984.0 | jugoslawien (sfr) | Goalkeeper | 2857 |
1 | james milner | j | james | milner | 1986.0 | england | Outfielder | 3333 |
2 | zlatan ibrahimovic | z | zlatan | ibrahimovic | 1981.0 | sweden | Outfielder | 3455 |
3 | nicolas penneteau | n | nicolas | penneteau | 1981.0 | france | Goalkeeper | 5578 |
4 | antonio rosati | a | antonio | rosati | 1983.0 | italy | Goalkeeper | 6442 |
lst_leagues_tm_valuations = list(df_tm_valuations['league_code'].unique())
lst_leagues_tm_valuations
['L1', 'GB1', 'MLS1', 'FR1', 'ES1', 'IT1']
lst_leagues_tm_bio_status_big5 = ['GB1', 'IT1', 'FR1', 'L1', 'ES1']
df_tm_valuations = df_tm_valuations[df_tm_valuations['league_code'].isin(lst_leagues_tm_bio_status_big5)]
lst_seasons_tm_valuations = list(df_tm_valuations['season'].unique())
lst_seasons_tm_valuations
['2004/2005', '2005/2006', '2006/2007', '2007/2008', '2008/2009', '2010/2011', '2011/2012', '2012/2013', '2013/2014', '2014/2015', '2015/2016', '2016/2017', '2017/2018', '2018/2019', '2009/2010', '2019/2020', '2020/2021', '2021/2022']
lst_seasons_tm_valuations = ['2017/2018', '2018/2019', '2019/2020', '2020/2021', '2021/2022']
df_tm_valuations = df_tm_valuations[df_tm_valuations['season'].isin(lst_seasons_tm_valuations)]
# Remove accents and create lowercase name
df_tm_valuations['player_name_lower'] = (df_tm_valuations['player_name']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# First Name Lower
df_tm_valuations['first_name_lower'] = df_tm_valuations['player_name_lower'].str.rsplit(' ', 0).str[0]
# Last Name Lower
df_tm_valuations['last_name_lower'] = df_tm_valuations['player_name_lower'].str.rsplit(' ', 1).str[-1]
# First Initial Lower
df_tm_valuations['first_initial_lower'] = df_tm_valuations['player_name_lower'].astype(str).str[0]
# Remove accents and create lowercase name
df_tm_valuations['country_lower'] = (df_tm_valuations['citizenship']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# Display DataFrame
df_tm_valuations.head()
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | 26 | 2017/2018 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 675000.0 | 750000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany |
13 | 26 | 2018/2019 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 0.0 | 0 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany |
27 | 80 | 2017/2018 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany |
28 | 80 | 2018/2019 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany |
41 | 488 | 2017/2018 | gerhard tremmel | Swansea City | retired | GB1 | 42.0 | 225000.0 | 250000 | 1978-11-16 | München | 1978.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | NaN | NaN | Germany | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | germany |
# Define columns
cols_tm_valuations = ['player_name',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
#'age',
'birth_year',
'country_lower',
'outfielder_goalkeeper',
'season',
'tm_id'
]
# Select columns of interest
df_tm_valuations_select = df_tm_valuations[cols_tm_valuations]
# Drop duplicates
df_tm_valuations_select = df_tm_valuations_select.drop_duplicates()
# Display DataFrame
df_tm_valuations_select.head()
player_name | first_initial_lower | first_name_lower | last_name_lower | birth_year | country_lower | outfielder_goalkeeper | season | tm_id | |
---|---|---|---|---|---|---|---|---|---|
12 | roman weidenfeller | r | roman | weidenfeller | 1980.0 | germany | Goalkeeper | 2017/2018 | 26 |
13 | roman weidenfeller | r | roman | weidenfeller | 1980.0 | germany | Goalkeeper | 2018/2019 | 26 |
27 | tom starke | t | tom | starke | 1981.0 | germany | Goalkeeper | 2017/2018 | 80 |
28 | tom starke | t | tom | starke | 1981.0 | germany | Goalkeeper | 2018/2019 | 80 |
41 | gerhard tremmel | g | gerhard | tremmel | 1978.0 | germany | Goalkeeper | 2017/2018 | 488 |
lst_leagues_tm_transfers = list(df_tm_transfers['league_code'].unique())
lst_leagues_tm_transfers
['L1', 'GB2', 'NL1', 'PO1', 'FR1', 'GB1', 'RU1', 'ES1', 'IT1']
lst_leagues_tm_transfers_big5 = ['GB1', 'IT1', 'FR1', 'L1', 'ES1']
df_tm_valuations = df_tm_valuations[df_tm_valuations['league_code'].isin(lst_leagues_tm_transfers_big5)]
lst_seasons_tm_transfers = list(df_tm_transfers['season'].unique())
lst_seasons_tm_transfers
['1992/1993', '1993/1994', '1994/1995', '1995/1996', '1996/1997', '1997/1998', '1998/1999', '1999/2000', '2000/2001', '2001/2002', '2002/2003', '2003/2004', '2004/2005', '2005/2006', '2006/2007', '2007/2008', '2008/2009', '2009/2010', '2010/2011', '2011/2012', '2012/2013', '2013/2014', '2014/2015', '2015/2016', '2016/2017', '2017/2018', '2018/2019', '2019/2020', '2020/2021']
lst_leagues_tm_transfers_big5 = ['2017/2018', '2018/2019', '2019/2020', '2020/2021', '2021/2022']
df_tm_valuations = df_tm_valuations[df_tm_valuations['season'].isin(lst_leagues_tm_transfers_big5)]
# Remove accents and create lowercase name
df_tm_transfers['player_name_lower'] = (df_tm_transfers['player_name']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# First Name Lower
df_tm_transfers['first_name_lower'] = df_tm_transfers['player_name_lower'].str.rsplit(' ', 0).str[0]
# Last Name Lower
df_tm_transfers['last_name_lower'] = df_tm_transfers['player_name_lower'].str.rsplit(' ', 1).str[-1]
# First Initial Lower
df_tm_transfers['first_initial_lower'] = df_tm_transfers['player_name_lower'].astype(str).str[0]
# Display DataFrame
df_tm_transfers.head()
club_name | player_name | age | position | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | year | season | league_code | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | VfB Stuttgart | Adrian Knup | 23.0 | Centre-Forward | FC Luzern | ? | in | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 | adrian knup | adrian | knup | a |
1 | 1. FC Köln | Adrian Spyrka | 24.0 | Central Midfield | Stuttg. Kickers | End of loanJun 30, 1992 | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | adrian spyrka | adrian | spyrka | a |
2 | Karlsruher SC | Alexander Famulla | 31.0 | Goalkeeper | FC 08 Homburg | ? | out | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander famulla | alexander | famulla | a |
3 | SV Werder Bremen | Alexander Malchow | 22.0 | Centre-Back | VfB Oldenburg | Free transfer | out | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander malchow | alexander | malchow | a |
4 | SG Dynamo Dresden | Alexander Zickler | 18.0 | Centre-Forward | D. Dresden U19 | - | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander zickler | alexander | zickler | a |
# Define columns
cols_tm_transfers = ['player_name',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
'season'
]
# Select columns of interest
df_tm_transfers_select = df_tm_transfers[cols_tm_transfers]
# Drop duplicates
df_tm_transfers_select = df_tm_transfers.drop_duplicates()
# Display DataFrame
df_tm_transfers_select.head()
club_name | player_name | age | position | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | year | season | league_code | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | VfB Stuttgart | Adrian Knup | 23.0 | Centre-Forward | FC Luzern | ? | in | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 | adrian knup | adrian | knup | a |
1 | 1. FC Köln | Adrian Spyrka | 24.0 | Central Midfield | Stuttg. Kickers | End of loanJun 30, 1992 | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | adrian spyrka | adrian | spyrka | a |
2 | Karlsruher SC | Alexander Famulla | 31.0 | Goalkeeper | FC 08 Homburg | ? | out | Summer | NaN | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander famulla | alexander | famulla | a |
3 | SV Werder Bremen | Alexander Malchow | 22.0 | Centre-Back | VfB Oldenburg | Free transfer | out | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander malchow | alexander | malchow | a |
4 | SG Dynamo Dresden | Alexander Zickler | 18.0 | Centre-Forward | D. Dresden U19 | - | in | Summer | 0.0 | 1 Bundesliga | 1992 | 1992/1993 | L1 | alexander zickler | alexander | zickler | a |
lst_leagues_capology = list(df_capology['country'].unique())
lst_leagues_capology
['Switzerland', 'Germany', 'Iceland', 'Brazil', 'Ghana', 'South Korea', 'Austria', 'Serbia', 'Turkey', 'Greece', 'Czech Republic', 'Netherlands', 'France', 'Paraguay', 'Japan', 'Slovenia', 'Latvia', 'Chile', 'Mexico', 'Finland', 'Jamaica', 'Croatia', 'Australia', 'Ukraine', 'Spain', 'United States', 'Morocco', 'Portugal', 'Poland', 'Colombia', 'Sweden', 'Denmark', 'Gabon', 'Costa Rica', 'Hungary', 'Bosnia-Herzegovina', 'Nigeria', 'Democratic Republic of Congo', 'Tunisia', 'Ecuador', 'Argentina', 'Kosovo', 'Uruguay', 'Israel', 'Albania', 'Palästina', 'Norway', 'Slovakia', 'Georgia', 'Italy', 'The Gambia', "Cote d'Ivoire", 'Cameroon', 'Russia', 'Philippines', 'Belgium', 'Guinea', 'Scotland', 'Bulgaria', 'Algeria', 'Peru', 'Senegal', 'Uganda', 'Mali', 'Iran', 'Venezuela', 'England', 'Azerbaijan', 'Curacao', 'Togo', 'Montenegro', 'Romania', 'China', 'Canada', 'Luxembourg', 'Wales', 'Burkina Faso', 'New Zealand', 'Armenia', 'North Macedonia', 'Guadeloupe', 'Angola', 'Dominican Republic', 'Qatar', 'Martinique', 'Saudi Arabia', 'Congo', 'Honduras', 'Kenya', 'Mauritania', 'Central African Republic', 'Guinea-Bissau', 'Equatorial Guinea', 'Cape Verde', 'Mauritius', 'Comoros', 'Benin', 'Haiti', 'Monaco', 'French Guiana', 'South Africa', 'Zambia', 'Madagascar', 'Chad', 'Belarus', 'Lithuania', 'Mozambique', 'Niger', 'Neukaledonien', 'Trinidad and Tobago', 'Zimbabwe', 'Democratic Republic of the Congo', 'Panama', 'Ireland', 'Iraq', 'Sierra Leone', 'St. Kitts & Nevis', 'Guatemala', 'Guam', 'El Salvador', 'Puerto Rico', 'Guyana', 'Cuba', 'Belize', 'Eritrea', 'St. Vincent & Grenadinen', 'Bolivia', 'Bermuda', 'Afghanistan', 'Lebanon', 'Liechtenstein', 'Egypt', 'Syria', 'Libya', 'Rwanda', 'Tanzania', 'Somalia', 'Liberia', 'Northern Ireland', 'Suriname', 'Malaysia', nan, 'Estonia', 'Burundi', 'Cyprus', 'Moldova', 'North Korea', 'San Marino', 'Uzbekistan']
lst_leagues_capology_big5 = ['England', 'Germany', 'Spain', 'France', 'Italy']
df_capology = df_capology[df_capology['country'].isin(lst_leagues_capology_big5)]
# Map season to DataFrame
df_capology['season'] = df_capology['season'].map(dict_seasons)
lst_seasons_capology = list(df_capology['season'].unique())
lst_seasons_capology
['2016/2017', '2017/2018', '2018/2019', '2019/2020', nan, '2020/2021']
lst_seasons_capology = ['2017/2018', '2018/2019', '2019/2020', '2020/2021', '2021/2022']
df_capology = df_capology[df_capology['season'].isin(lst_seasons_capology)]
# Remove accents and create lowercase name
df_capology['player_name_lower'] = (df_capology['player']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# First Name Lower
df_capology['first_name_lower'] = df_capology['player_name_lower'].str.rsplit(' ', 0).str[0]
# Last Name Lower
df_capology['last_name_lower'] = df_capology['player_name_lower'].str.rsplit(' ', 1).str[-1]
# First Initial Lower
df_capology['first_initial_lower'] = df_capology['player_name_lower'].astype(str).str[0]
# Remove accents and create lowercase name
df_capology['country_lower'] = (df_capology['country']
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8')
.str.lower()
)
# Display DataFrame
df_capology.head()
player | season | league | team | position | outfielder_goalkeeper | age | country | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
645 | Andreas Luthe | 2017/2018 | Bundesliga | Augsburg | Goalkeeper | Goalkeeper | 30 | Germany | 6147.0 | 319688.0 | 325953.0 | NaN | NaN | NaN | NaN | andreas luthe | andreas | luthe | a | germany |
647 | Christoph Janker | 2017/2018 | Bundesliga | Augsburg | Defender | Outfielder | 32 | Germany | 7585.0 | 394432.0 | 402161.0 | NaN | NaN | NaN | NaN | christoph janker | christoph | janker | c | germany |
648 | Daniel Baier | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 33 | Germany | 34185.0 | 1777646.0 | 1812482.0 | NaN | NaN | NaN | NaN | daniel baier | daniel | baier | d | germany |
651 | Efkan Bekiroglu | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 21 | Germany | 761.0 | 39623.0 | 40399.0 | NaN | NaN | NaN | NaN | efkan bekiroglu | efkan | bekiroglu | e | germany |
652 | Erik Thommy | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 22 | Germany | 7585.0 | 394432.0 | 402161.0 | NaN | NaN | NaN | NaN | erik thommy | erik | thommy | e | germany |
# Define columns
cols_capology = ['player',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
'country_lower',
'outfielder_goalkeeper',
'season'
]
# Select columns of interest
df_capology_select = df_capology[cols_capology]
# Drop duplicates
df_capology_select = df_capology.drop_duplicates()
# Display DataFrame
df_capology_select.head()
player | season | league | team | position | outfielder_goalkeeper | age | country | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
645 | Andreas Luthe | 2017/2018 | Bundesliga | Augsburg | Goalkeeper | Goalkeeper | 30 | Germany | 6147.0 | 319688.0 | 325953.0 | NaN | NaN | NaN | NaN | andreas luthe | andreas | luthe | a | germany |
647 | Christoph Janker | 2017/2018 | Bundesliga | Augsburg | Defender | Outfielder | 32 | Germany | 7585.0 | 394432.0 | 402161.0 | NaN | NaN | NaN | NaN | christoph janker | christoph | janker | c | germany |
648 | Daniel Baier | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 33 | Germany | 34185.0 | 1777646.0 | 1812482.0 | NaN | NaN | NaN | NaN | daniel baier | daniel | baier | d | germany |
651 | Efkan Bekiroglu | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 21 | Germany | 761.0 | 39623.0 | 40399.0 | NaN | NaN | NaN | NaN | efkan bekiroglu | efkan | bekiroglu | e | germany |
652 | Erik Thommy | 2017/2018 | Bundesliga | Augsburg | Midfielder | Outfielder | 22 | Germany | 7585.0 | 394432.0 | 402161.0 | NaN | NaN | NaN | NaN | erik thommy | erik | thommy | e | germany |
Now we have the player-level datasets, we are now read to merge them to form one dataset of identifiers, that can then be used join any datasets together in future analysis.
We are required to join two data sets together that do not have a common unique identifier and are required to use a third-party Python library - record linkage, installed using pip install recordlinkage
. record linkage provides a simple interface to link records in or between data sources.
As part of the joining of the datasets, the FBref dataset will be used as the base, from which the other datasets will be subsequently joined.
'Record linkage' is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity (source).
Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked. Record linkage is referred to as data linkage in many jurisdictions, but the two are the same process.
The toolkit provides most of the tools needed for record linkage and deduplication. The package contains indexing methods, functions to compare records and classifiers. The package is developed for research and the linking of small or medium sized files.
For a full guide on how to use record linkage, see the official documentation here and also this worked example by Chris Moffitt at the following [link].
# Join the Bio-Status dataset to the Historical Player Valuation dataset
## Join the TransferMarkt Bio-Status and Player Valuation DataFrames
df_fbref_merge = pd.merge(df_fbref_players, df_fbref_tm_urls, left_on='Player', right_on='player_name_fbref', how='left')
## Rename columns - required otherwise 'birth_year' gets dropped
df_fbref_merge = df_fbref_merge.rename(columns={'birth_year': 'born'})
## Remove duplicates
### Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns
df_fbref_merge = df_fbref_merge[df_fbref_merge.columns.drop(list(df_fbref_merge.filter(regex='_y')))]
df_fbref_merge.columns = df_fbref_merge.columns.str.replace('_x', '')
### Remove duplicate rows
df_fbref_merge = df_fbref_merge.drop_duplicates(subset=['season', 'player_name_fbref', 'Team Name', 'Team Country', 'Comp'], keep='first')
### Drop unnecessary columns
df_fbref_merge = df_fbref_merge.drop(['Player'], axis=1)
## Rename columns
df_fbref_merge = df_fbref_merge.rename(columns={'born': 'birth_year'})
## Sort columns
df_fbref_merge = df_fbref_merge.sort_values(by=['player_name_fbref'], ascending=[True])
## Display DataFrame
df_fbref_merge.head()
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | League Name | League ID | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5430 | ie IRL | FW | Brighton | Premier League | 19 | 2000.0 | 24 | 14 | 1258.0 | 14.0 | 3 | 1 | 3 | 0 | 0 | 0 | 0 | 0.21 | 0.07 | 0.29 | 0.21 | 0.29 | 3.2 | 3.2 | 0.3 | 3.5 | 0.23 | 0.02 | 0.25 | 0.23 | 0.25 | Matches | 38.0 | 13.0 | 34.2 | 2.72 | 0.93 | 0.08 | 0.23 | 15.9 | 0.0 | 0.08 | -0.2 | -0.2 | 126.0 | 163.0 | 77.3 | 1739.0 | 242.0 | 76.0 | 92.0 | 82.6 | 31.0 | 39.0 | 79.5 | 7.0 | 11.0 | 63.6 | 0.7 | 6.0 | 6.0 | 2.0 | 0.0 | 10.0 | 148.0 | 15.0 | 1.0 | 50.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 52.0 | 21.0 | 27.0 | 107.0 | 13.0 | 1.0 | 6.0 | 0.0 | 1.0 | 4.0 | 10.0 | 25.0 | 1.79 | 7.0 | 0.0 | 3.0 | 9.0 | 3.0 | 5.0 | 0.36 | 1.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.0 | 12.0 | 8.0 | 1.0 | 5.0 | 6.0 | 3.0 | 25.0 | 9.0 | 69.0 | 29.5 | 14.0 | 94.0 | 126.0 | 0.0 | 7.0 | 17.0 | 1.0 | 0.0 | 349.0 | 2.0 | 61.0 | 37.5 | 6.0 | 1.0 | 228.0 | 12.0 | 42.0 | 34.0 | 535.0 | 235.0 | 43.9 | 99.0 | 52 | 36.8 | 72.0 | 0.0 | 10 | 26.0 | 4 | 1.13 | 18.0 | 22.0 | -4.0 | -0.29 | 0.17 | 15.6 | 19.8 | -4.2 | -0.30 | 0.08 | 0.0 | 16 | 2.0 | 0.0 | 0.0 | 54.0 | 14.0 | 48.0 | 22.6 | Big-5-European-Leagues | Big5 | 2019/2020 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
8211 | ie IRL | FW | Brighton | Premier League | 20 | 2000.0 | 17 | 9 | 791.0 | 8.8 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0.23 | 0.11 | 0.34 | 0.23 | 0.34 | 3.5 | 3.5 | 0.2 | 3.7 | 0.40 | 0.02 | 0.42 | 0.40 | 0.42 | Matches | 23.0 | 8.0 | 34.8 | 2.62 | 0.91 | 0.09 | 0.25 | 13.7 | 0.0 | 0.15 | -1.5 | -1.5 | 79.0 | 101.0 | 78.2 | 1147.0 | 165.0 | 45.0 | 56.0 | 80.4 | 26.0 | 30.0 | 86.7 | 4.0 | 5.0 | 80.0 | 0.8 | 5.0 | 2.0 | 1.0 | 0.0 | 3.0 | 91.0 | 10.0 | 0.0 | 22.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 64.0 | 26.0 | 11.0 | 11.0 | 74.0 | 5.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 4.0 | 12.0 | 1.37 | 7.0 | 0.0 | 3.0 | 2.0 | 0.0 | 1.0 | 0.11 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 5.0 | 2.0 | 4.0 | 1.0 | 1.0 | 20.0 | 4.0 | 40.0 | 32.3 | 7.0 | 58.0 | 59.0 | 0.0 | 8.0 | 7.0 | 1.0 | 0.0 | 201.0 | 1.0 | 38.0 | 80.0 | 8.0 | 0.0 | 124.0 | 4.0 | 29.0 | 15.0 | 357.0 | 143.0 | 40.1 | 64.0 | 47 | 23.1 | 68.0 | NaN | 8 | 23.0 | 11 | 0.88 | 12.0 | 17.0 | -5.0 | -0.57 | -0.53 | 13.8 | 7.8 | 6.0 | 0.69 | 0.42 | 0.0 | 5 | 1.0 | 0.0 | 0.0 | 28.0 | 11.0 | 30.0 | 26.8 | Big-5-European-Leagues | Big5 | 2020/2021 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
11071 | ie IRL | FW | Brighton | Premier League | 21 | 2000.0 | 1 | 0 | 45.0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.4 | 0.4 | 0.0 | 0.4 | 0.85 | 0.00 | 0.85 | 0.85 | 0.85 | Matches | 1.0 | 0.0 | 0.0 | 2.00 | 0.00 | 0.00 | NaN | 9.0 | 0.0 | 0.42 | -0.4 | -0.4 | 2.0 | 3.0 | 66.7 | 14.0 | 0.0 | 2.0 | 2.0 | 100.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 1.0 | 8.3 | 1.0 | 6.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 2.0 | NaN | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 20.0 | 4.0 | 20.0 | 2.0 | 45 | 16.7 | NaN | 0.0 | 1 | 45.0 | 1 | 3.00 | 0.0 | 0.0 | 0.0 | 0.00 | -0.40 | 1.0 | 0.7 | 0.3 | 0.68 | 0.74 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | Big-5-European-Leagues | Big5 | 2021/2022 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
8212 | eng ENG | DF | West Ham | Premier League | 30 | 1989.0 | 36 | 36 | 3170.0 | 35.2 | 0 | 8 | 0 | 0 | 0 | 3 | 0 | 0.00 | 0.23 | 0.23 | 0.00 | 0.23 | 0.9 | 0.9 | 5.9 | 6.9 | 0.03 | 0.17 | 0.19 | 0.03 | 0.19 | Matches | 19.0 | 4.0 | 21.1 | 0.54 | 0.11 | 0.00 | 0.00 | 24.6 | 11.0 | 0.05 | -0.9 | -0.9 | 1541.0 | 2061.0 | 74.8 | 30884.0 | 14764.0 | 655.0 | 736.0 | 89.0 | 616.0 | 736.0 | 83.7 | 252.0 | 518.0 | 48.6 | 2.1 | 60.0 | 149.0 | 24.0 | 17.0 | 120.0 | 1617.0 | 444.0 | 2.0 | 158.0 | 84.0 | 91.0 | 82.0 | 22.0 | 51.0 | 6.0 | 1104.0 | 365.0 | 592.0 | 1607.0 | 118.0 | 49.0 | 259.0 | 4.0 | 23.0 | 35.0 | 48.0 | 66.0 | 96.0 | 2.73 | 48.0 | 41.0 | 1.0 | 2.0 | 0.0 | 14.0 | 0.40 | 5.0 | 6.0 | 0.0 | 2.0 | 1.0 | 0.0 | 31.0 | 13.0 | 15.0 | 14.0 | 2.0 | 14.0 | 41.2 | 20.0 | 67.0 | 29.1 | 122.0 | 90.0 | 18.0 | 0.0 | 34.0 | 67.0 | 98.0 | 0.0 | 2307.0 | 182.0 | 18.0 | 40.0 | 8.0 | 2.0 | 1266.0 | 3.0 | 7.0 | 9.0 | 1366.0 | 1318.0 | 96.5 | 30.0 | 88 | 92.7 | 88.0 | NaN | 0 | NaN | 0 | 1.81 | 60.0 | 41.0 | 19.0 | 0.54 | 1.98 | 51.3 | 42.9 | 8.4 | 0.24 | 1.26 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 307.0 | 38.0 | 30.0 | 55.9 | Big-5-European-Leagues | Big5 | 2020/2021 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 |
2728 | eng ENG | DF | West Ham | Premier League | 28 | 1989.0 | 20 | 18 | 1589.0 | 17.7 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.00 | 0.06 | 0.06 | 0.00 | 0.06 | 0.5 | 0.5 | 0.9 | 1.4 | 0.03 | 0.05 | 0.08 | 0.03 | 0.08 | Matches | 11.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.00 | NaN | 23.5 | 2.0 | 0.04 | -0.5 | -0.5 | 842.0 | 1070.0 | 78.7 | 13627.0 | 5572.0 | 453.0 | 501.0 | 90.4 | 307.0 | 371.0 | 82.7 | 64.0 | 140.0 | 45.7 | 0.1 | 16.0 | 55.0 | 15.0 | 5.0 | 65.0 | 854.0 | 216.0 | 0.0 | 168.0 | 18.0 | 46.0 | 10.0 | 0.0 | 2.0 | 0.0 | 642.0 | 235.0 | 193.0 | 787.0 | 51.0 | 27.0 | 190.0 | 4.0 | 2.0 | 21.0 | 27.0 | 44.0 | 29.0 | 1.64 | 19.0 | 6.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.11 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 19.0 | 14.0 | 12.0 | 4.0 | 12.0 | 42.9 | 16.0 | 68.0 | 31.5 | 129.0 | 59.0 | 28.0 | 0.0 | 39.0 | 49.0 | 60.0 | 1.0 | 1266.0 | 78.0 | 36.0 | 63.6 | 7.0 | 1.0 | 723.0 | 8.0 | 11.0 | 13.0 | 797.0 | 715.0 | 89.7 | 43.0 | 79 | 46.5 | 85.0 | 16.0 | 2 | 30.0 | 7 | 1.30 | 21.0 | 26.0 | -5.0 | -0.28 | -0.38 | 20.1 | 25.3 | -5.3 | -0.30 | 0.12 | 0.0 | 2 | 0.0 | 0.0 | 0.0 | 169.0 | 22.0 | 14.0 | 61.1 | Big-5-European-Leagues | Big5 | 2018/2019 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 |
print('No. rows in FBref Player DataFrame before join to FBref-TM Mapping data: {}'.format(len(df_fbref_players)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_fbref_merge)))
print('Variance in rows before and after join: {}\n'.format(len(df_fbref_merge) - len(df_fbref_players)))
print('-'*10)
No. rows in FBref Player DataFrame before join to FBref-TM Mapping data: 13680 No. rows in DataFrame AFTER join: 12753 Variance in rows before and after join: -927 ----------
Some duplication occuring here.
df_fbref_merge[df_fbref_merge['player_name_fbref'].str.contains('Gerard Piqu', na=False)]
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | League Name | League ID | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3599 | es ESP | DF | Barcelona | La Liga | 31 | 1987.0 | 35 | 35 | 3150.0 | 35.0 | 4 | 2 | 4 | 0 | 0 | 6 | 0 | 0.11 | 0.06 | 0.17 | 0.11 | 0.17 | 3.7 | 3.7 | 1.5 | 5.2 | 0.11 | 0.04 | 0.15 | 0.11 | 0.15 | Matches | 20.0 | 11.0 | 55.0 | 0.57 | 0.31 | 0.20 | 0.36 | 7.3 | 0.0 | 0.19 | 0.3 | 0.3 | 2230.0 | 2429.0 | 91.8 | 46352.0 | 14931.0 | 666.0 | 710.0 | 93.8 | 1209.0 | 1277.0 | 94.7 | 343.0 | 420.0 | 81.7 | 0.5 | 8.0 | 156.0 | 5.0 | 0.0 | 103.0 | 2347.0 | 82.0 | 3.0 | 296.0 | 58.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1937.0 | 190.0 | 302.0 | 97.0 | 2125.0 | 126.0 | 12.0 | 13.0 | 5.0 | 22.0 | 14.0 | 13.0 | 25.0 | 0.71 | 17.0 | 0.0 | 2.0 | 1.0 | 1.0 | 6.0 | 0.17 | 2.0 | 0.0 | 1.0 | 2.0 | 1.0 | 0.0 | 45.0 | 27.0 | 29.0 | 15.0 | 1.0 | 20.0 | 54.1 | 17.0 | 87.0 | 31.9 | 160.0 | 101.0 | 12.0 | 2.0 | 37.0 | 77.0 | 156.0 | 2.0 | 2760.0 | 385.0 | 32.0 | 76.2 | 16.0 | 0.0 | 1915.0 | 1.0 | 11.0 | 9.0 | 1963.0 | 1889.0 | 96.2 | 27.0 | 90 | 92.1 | 90.0 | 35.0 | 0 | NaN | 1 | 2.43 | 86.0 | 30.0 | 56.0 | 1.60 | 2.27 | 69.9 | 35.6 | 34.2 | 0.98 | 1.29 | 0.0 | 24 | 1.0 | 0.0 | 0.0 | 473.0 | 91.0 | 33.0 | 73.4 | Big-5-European-Leagues | Big5 | 2018/2019 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 |
6327 | es ESP | DF | Barcelona | La Liga | 32 | 1987.0 | 35 | 35 | 3092.0 | 34.4 | 1 | 0 | 1 | 0 | 0 | 15 | 0 | 0.03 | 0.00 | 0.03 | 0.03 | 0.03 | 2.3 | 2.3 | 0.6 | 2.9 | 0.07 | 0.02 | 0.08 | 0.07 | 0.08 | Matches | 15.0 | 6.0 | 40.0 | 0.44 | 0.17 | 0.07 | 0.17 | 8.4 | 1.0 | 0.15 | -1.3 | -1.3 | 2469.0 | 2659.0 | 92.9 | 53752.0 | 14795.0 | 645.0 | 682.0 | 94.6 | 1381.0 | 1437.0 | 96.1 | 427.0 | 506.0 | 84.4 | -0.6 | 5.0 | 192.0 | 3.0 | 0.0 | 116.0 | 2548.0 | 111.0 | 2.0 | 275.0 | 79.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2121.0 | 227.0 | 311.0 | 113.0 | 2364.0 | 99.0 | 9.0 | 13.0 | 1.0 | 25.0 | 14.0 | 14.0 | 12.0 | 0.35 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.03 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 37.0 | 22.0 | 21.0 | 16.0 | 0.0 | 14.0 | 48.3 | 15.0 | 83.0 | 34.0 | 128.0 | 109.0 | 7.0 | 2.0 | 40.0 | 73.0 | 182.0 | 1.0 | 2996.0 | 427.0 | 28.0 | 100.0 | 9.0 | 0.0 | 2084.0 | 0.0 | 9.0 | 7.0 | 2211.0 | 2171.0 | 98.2 | 21.0 | 88 | 90.4 | 88.0 | 31.0 | 0 | NaN | 0 | 2.09 | 71.0 | 36.0 | 35.0 | 1.02 | -2.55 | 56.2 | 33.8 | 22.4 | 0.65 | -1.49 | 0.0 | 32 | 0.0 | 2.0 | 0.0 | 391.0 | 128.0 | 40.0 | 76.2 | Big-5-European-Leagues | Big5 | 2019/2020 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 |
11649 | es ESP | DF | Barcelona | La Liga | 34 | 1987.0 | 2 | 2 | 120.0 | 1.3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0.75 | 0.00 | 0.75 | 0.75 | 0.75 | 0.2 | 0.2 | 0.0 | 0.2 | 0.15 | 0.00 | 0.15 | 0.15 | 0.15 | Matches | 1.0 | 1.0 | 100.0 | 0.75 | 0.75 | 1.00 | 1.00 | 8.0 | 0.0 | 0.19 | 0.8 | 0.8 | 94.0 | 97.0 | 96.9 | 2030.0 | 415.0 | 26.0 | 28.0 | 92.9 | 51.0 | 51.0 | 100.0 | 17.0 | 18.0 | 94.4 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 89.0 | 8.0 | 0.0 | 9.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | 5.0 | 7.0 | 11.0 | 80.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.75 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | NaN | 0.0 | 2.0 | 25.0 | 4.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 10.0 | 0.0 | 112.0 | 24.0 | 1.0 | NaN | 0.0 | 0.0 | 69.0 | 0.0 | 0.0 | 0.0 | 82.0 | 79.0 | 96.3 | 0.0 | 60 | 44.4 | 60.0 | 1.0 | 0 | NaN | 0 | 2.00 | 4.0 | 2.0 | 2.0 | 1.50 | 0.90 | 3.3 | 1.5 | 1.8 | 1.35 | 1.00 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 9.0 | 5.0 | 1.0 | 83.3 | Big-5-European-Leagues | Big5 | 2021/2022 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 |
9109 | es ESP | DF | Barcelona | La Liga | 33 | 1987.0 | 18 | 18 | 1481.0 | 16.5 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.6 | 0.6 | 0.5 | 1.1 | 0.04 | 0.03 | 0.07 | 0.04 | 0.07 | Matches | 8.0 | 2.0 | 25.0 | 0.49 | 0.12 | 0.00 | 0.00 | 9.3 | 0.0 | 0.08 | -0.6 | -0.6 | 1173.0 | 1247.0 | 94.1 | 24743.0 | 6069.0 | 321.0 | 340.0 | 94.4 | 666.0 | 690.0 | 96.5 | 173.0 | 197.0 | 87.8 | -0.5 | 2.0 | 82.0 | 1.0 | 0.0 | 35.0 | 1204.0 | 43.0 | 0.0 | 91.0 | 32.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1033.0 | 93.0 | 121.0 | 72.0 | 1079.0 | 45.0 | 5.0 | 10.0 | 1.0 | 14.0 | 6.0 | 4.0 | 5.0 | 0.30 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.06 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 21.0 | 13.0 | 13.0 | 7.0 | 1.0 | 9.0 | 64.3 | 5.0 | 46.0 | 40.4 | 51.0 | 62.0 | 1.0 | 0.0 | 13.0 | 36.0 | 65.0 | 0.0 | 1368.0 | 161.0 | 14.0 | 100.0 | 1.0 | 0.0 | 961.0 | 0.0 | 2.0 | 3.0 | 1046.0 | 1015.0 | 97.0 | 7.0 | 82 | 43.3 | 82.0 | 13.0 | 0 | NaN | 1 | 1.53 | 34.0 | 22.0 | 12.0 | 0.73 | -0.90 | 32.9 | 16.1 | 16.8 | 1.02 | -0.02 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 182.0 | 66.0 | 21.0 | 75.9 | Big-5-European-Leagues | Big5 | 2020/2021 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 |
916 | es ESP | DF | Barcelona | La Liga | 30 | 1987.0 | 30 | 29 | 2631.0 | 29.2 | 2 | 0 | 2 | 0 | 0 | 8 | 0 | 0.07 | 0.00 | 0.07 | 0.07 | 0.07 | 3.0 | 3.0 | 0.9 | 3.8 | 0.10 | 0.03 | 0.13 | 0.10 | 0.13 | Matches | 18.0 | 5.0 | 27.8 | 0.62 | 0.17 | 0.11 | 0.40 | 9.2 | 0.0 | 0.16 | -1.0 | -1.0 | 1606.0 | 1810.0 | 88.7 | 34428.0 | 10291.0 | 460.0 | 499.0 | 92.2 | 841.0 | 904.0 | 93.0 | 293.0 | 380.0 | 77.1 | -0.9 | 6.0 | 123.0 | 4.0 | 0.0 | 90.0 | 1764.0 | 46.0 | 0.0 | 263.0 | 70.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1436.0 | 120.0 | 254.0 | 95.0 | 1552.0 | 106.0 | 10.0 | 8.0 | 1.0 | 28.0 | 19.0 | 11.0 | 17.0 | 0.58 | 13.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 30.0 | 22.0 | 22.0 | 8.0 | 0.0 | 12.0 | 34.3 | 23.0 | 78.0 | 32.5 | 144.0 | 80.0 | 16.0 | 2.0 | 26.0 | 62.0 | 103.0 | 5.0 | 2075.0 | 318.0 | 28.0 | 60.0 | 6.0 | 0.0 | 1419.0 | 0.0 | 4.0 | 7.0 | 1385.0 | 1341.0 | 96.8 | 27.0 | 88 | 76.9 | NaN | 27.0 | 1 | NaN | 6 | 2.57 | 81.0 | 23.0 | 58.0 | 1.98 | 0.62 | 65.0 | 30.0 | 35.1 | 1.20 | 0.96 | 0.0 | 23 | 1.0 | 0.0 | 0.0 | 353.0 | 54.0 | 19.0 | 74.0 | Big-5-European-Leagues | Big5 | 2017/2018 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 |
# Join the Bio-Status dataset to the Historical Player Valuation dataset
## Join the TransferMarkt Bio-Status and Player Valuation DataFrames
df_tm_merge = pd.merge(df_tm_valuations, df_tm_bio_status, left_on='player_name', right_on='player_name', how='left')
## Rename columns - required otherwise 'birth_year' gets dropped
df_tm_merge = df_tm_merge.rename(columns={'birth_year_x': 'born'})
## Remove duplicates
### Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns
df_tm_merge = df_tm_merge[df_tm_merge.columns.drop(list(df_tm_merge.filter(regex='_y')))]
df_tm_merge.columns = df_tm_merge.columns.str.replace('_x', '')
### Remove duplicate rows
df_tm_merge = df_tm_merge.drop_duplicates(subset=['tm_id', 'season', 'player_name'], keep='first')
## Rename columns
df_tm_merge = df_tm_merge.rename(columns={'born': 'birth_year'})
## Display DataFrame
df_tm_merge.head()
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | 2017/2018 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 675000.0 | 750000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 |
1 | 26 | 2018/2019 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 0.0 | 0 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 |
2 | 80 | 2017/2018 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 |
3 | 80 | 2018/2019 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 |
4 | 488 | 2017/2018 | gerhard tremmel | Swansea City | retired | GB1 | 42.0 | 225000.0 | 250000 | 1978-11-16 | München | 1978.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | NaN | NaN | Germany | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | germany | 16.0 | 11.0 | Germany | NaN | 0.0 | 2017-07-17 | NaN | NaN | NaN | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | NaN | 42.0 | 38.0 | 4.0 | NaN | 0.0 |
print('No. rows in TM Player Valuation DataFrame before join to TransferMarkt Bio-Status data: {}'.format(len(df_tm_valuations)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_tm_merge)))
print('Variance in rows before and after join: {}\n'.format(len(df_tm_merge) - len(df_tm_valuations)))
print('-'*10)
No. rows in TM Player Valuation DataFrame before join to TransferMarkt Bio-Status data: 30241 No. rows in DataFrame AFTER join: 30241 Variance in rows before and after join: 0 ----------
# Join Player Transfer dataset to Bio-Status-Valuation dataset
## Join the TransferMarkt Bio-Status and Transfer DataFrames
df_tm_merge_final = pd.merge(df_tm_merge, df_tm_transfers, left_on=['player_name', 'season'], right_on=['player_name_lower', 'season'], how='left')
## Rename columns - required otherwise 'birth_year' gets dropped
df_tm_merge_final = df_tm_merge_final.rename(columns={'birth_year': 'born'})
## Remove duplicates
### Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns
df_tm_merge_final = df_tm_merge_final[df_tm_merge_final.columns.drop(list(df_tm_merge_final.filter(regex='_y')))]
df_tm_merge_final.columns = df_tm_merge_final.columns.str.replace('_x','')
### Remove duplicate rows
df_tm_merge_final = df_tm_merge_final.drop_duplicates(subset=['tm_id', 'season', 'player_name'], keep='first')
## Drop unnecessary columns
df_tm_merge_final = df_tm_merge_final.drop(['year'], axis=1)
## Rename columns
df_tm_merge_final = df_tm_merge_final.rename(columns={'born': 'birth_year'})
## Display DataFrame
df_tm_merge_final.head()
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | 2017/2018 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 675000.0 | 750000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 26 | 2018/2019 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 0.0 | 0 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 | Borussia Dortmund | Retired | - | out | Summer | 0.0 | 1 Bundesliga |
2 | 80 | 2017/2018 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 | Bayern Munich | Career break | - | in | Summer | 0.0 | 1 Bundesliga |
4 | 80 | 2018/2019 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 | Bayern Munich | Retired | - | out | Summer | 0.0 | 1 Bundesliga |
5 | 488 | 2017/2018 | gerhard tremmel | Swansea City | retired | GB1 | 42.0 | 225000.0 | 250000 | 1978-11-16 | München | 1978.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | NaN | NaN | Germany | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | germany | 16.0 | 11.0 | Germany | NaN | 0.0 | 2017-07-17 | NaN | NaN | NaN | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | NaN | 42.0 | 38.0 | 4.0 | NaN | 0.0 | Swansea City | Retired | - | out | Summer | 0.0 | Premier League |
print('No. rows in merged TM Bio-Status-Valuation DataFrame before join to TM Recorded Transfer data: {}'.format(len(df_tm_merge)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_tm_merge_final)))
print('Variance in rows before and after join: {}\n'.format(len(df_tm_merge_final) - len(df_tm_merge)))
print('-'*10)
No. rows in merged TM Bio-Status-Valuation DataFrame before join to TM Recorded Transfer data: 30241 No. rows in DataFrame AFTER join: 30241 Variance in rows before and after join: 0 ----------
df_tm_merge_final[df_tm_merge_final['player_name'].str.contains('gerard piqu', na=False)]
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1016 | 18944 | 2017/2018 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000 | 1987-02-02 | Barcelona | 1987.0 | Defender - Centre-Back | CB | Defender | Outfielder | 194.0 | right | Spain | NaN | AC Talent | gerard pique | gerard | pique | g | spain | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1017 | 18944 | 2018/2019 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000 | 1987-02-02 | Barcelona | 1987.0 | Defender - Centre-Back | CB | Defender | Outfielder | 194.0 | right | Spain | NaN | AC Talent | gerard pique | gerard | pique | g | spain | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1018 | 18944 | 2019/2020 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 22500000.0 | 25000000 | 1987-02-02 | Barcelona | 1987.0 | Defender - Centre-Back | CB | Defender | Outfielder | 194.0 | right | Spain | NaN | AC Talent | gerard pique | gerard | pique | g | spain | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1019 | 18944 | 2020/2021 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 13500000.0 | 15000000 | 1987-02-02 | Barcelona | 1987.0 | Defender - Centre-Back | CB | Defender | Outfielder | 194.0 | right | Spain | NaN | AC Talent | gerard pique | gerard | pique | g | spain | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1020 | 18944 | 2021/2022 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 9000000.0 | 10000000 | 1987-02-02 | Barcelona | 1987.0 | Defender - Centre-Back | CB | Defender | Outfielder | 194.0 | right | Spain | NaN | AC Talent | gerard pique | gerard | pique | g | spain | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
The first step is to create a indexer object.
The indexing module is used to make pairs of records. These pairs are called candidate links or candidate matches. There are several indexing algorithms available such as blocking and sorted neighborhood indexing. See the following references for background information about indexation
One key concept is that we can use blocking to limit the number of comparisons. For instance, we know that it is very likely that we only want to compare records that have the same last name, first name initial and birth year. We can use this knowledge to setup a block on these columns in both DataFrames:
df_fbref_merge.head()
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | League Name | League ID | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5430 | ie IRL | FW | Brighton | Premier League | 19 | 2000.0 | 24 | 14 | 1258.0 | 14.0 | 3 | 1 | 3 | 0 | 0 | 0 | 0 | 0.21 | 0.07 | 0.29 | 0.21 | 0.29 | 3.2 | 3.2 | 0.3 | 3.5 | 0.23 | 0.02 | 0.25 | 0.23 | 0.25 | Matches | 38.0 | 13.0 | 34.2 | 2.72 | 0.93 | 0.08 | 0.23 | 15.9 | 0.0 | 0.08 | -0.2 | -0.2 | 126.0 | 163.0 | 77.3 | 1739.0 | 242.0 | 76.0 | 92.0 | 82.6 | 31.0 | 39.0 | 79.5 | 7.0 | 11.0 | 63.6 | 0.7 | 6.0 | 6.0 | 2.0 | 0.0 | 10.0 | 148.0 | 15.0 | 1.0 | 50.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 52.0 | 21.0 | 27.0 | 107.0 | 13.0 | 1.0 | 6.0 | 0.0 | 1.0 | 4.0 | 10.0 | 25.0 | 1.79 | 7.0 | 0.0 | 3.0 | 9.0 | 3.0 | 5.0 | 0.36 | 1.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.0 | 12.0 | 8.0 | 1.0 | 5.0 | 6.0 | 3.0 | 25.0 | 9.0 | 69.0 | 29.5 | 14.0 | 94.0 | 126.0 | 0.0 | 7.0 | 17.0 | 1.0 | 0.0 | 349.0 | 2.0 | 61.0 | 37.5 | 6.0 | 1.0 | 228.0 | 12.0 | 42.0 | 34.0 | 535.0 | 235.0 | 43.9 | 99.0 | 52 | 36.8 | 72.0 | 0.0 | 10 | 26.0 | 4 | 1.13 | 18.0 | 22.0 | -4.0 | -0.29 | 0.17 | 15.6 | 19.8 | -4.2 | -0.30 | 0.08 | 0.0 | 16 | 2.0 | 0.0 | 0.0 | 54.0 | 14.0 | 48.0 | 22.6 | Big-5-European-Leagues | Big5 | 2019/2020 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
8211 | ie IRL | FW | Brighton | Premier League | 20 | 2000.0 | 17 | 9 | 791.0 | 8.8 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0.23 | 0.11 | 0.34 | 0.23 | 0.34 | 3.5 | 3.5 | 0.2 | 3.7 | 0.40 | 0.02 | 0.42 | 0.40 | 0.42 | Matches | 23.0 | 8.0 | 34.8 | 2.62 | 0.91 | 0.09 | 0.25 | 13.7 | 0.0 | 0.15 | -1.5 | -1.5 | 79.0 | 101.0 | 78.2 | 1147.0 | 165.0 | 45.0 | 56.0 | 80.4 | 26.0 | 30.0 | 86.7 | 4.0 | 5.0 | 80.0 | 0.8 | 5.0 | 2.0 | 1.0 | 0.0 | 3.0 | 91.0 | 10.0 | 0.0 | 22.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 64.0 | 26.0 | 11.0 | 11.0 | 74.0 | 5.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 4.0 | 12.0 | 1.37 | 7.0 | 0.0 | 3.0 | 2.0 | 0.0 | 1.0 | 0.11 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 5.0 | 2.0 | 4.0 | 1.0 | 1.0 | 20.0 | 4.0 | 40.0 | 32.3 | 7.0 | 58.0 | 59.0 | 0.0 | 8.0 | 7.0 | 1.0 | 0.0 | 201.0 | 1.0 | 38.0 | 80.0 | 8.0 | 0.0 | 124.0 | 4.0 | 29.0 | 15.0 | 357.0 | 143.0 | 40.1 | 64.0 | 47 | 23.1 | 68.0 | NaN | 8 | 23.0 | 11 | 0.88 | 12.0 | 17.0 | -5.0 | -0.57 | -0.53 | 13.8 | 7.8 | 6.0 | 0.69 | 0.42 | 0.0 | 5 | 1.0 | 0.0 | 0.0 | 28.0 | 11.0 | 30.0 | 26.8 | Big-5-European-Leagues | Big5 | 2020/2021 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
11071 | ie IRL | FW | Brighton | Premier League | 21 | 2000.0 | 1 | 0 | 45.0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.4 | 0.4 | 0.0 | 0.4 | 0.85 | 0.00 | 0.85 | 0.85 | 0.85 | Matches | 1.0 | 0.0 | 0.0 | 2.00 | 0.00 | 0.00 | NaN | 9.0 | 0.0 | 0.42 | -0.4 | -0.4 | 2.0 | 3.0 | 66.7 | 14.0 | 0.0 | 2.0 | 2.0 | 100.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 1.0 | 8.3 | 1.0 | 6.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 2.0 | NaN | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 20.0 | 4.0 | 20.0 | 2.0 | 45 | 16.7 | NaN | 0.0 | 1 | 45.0 | 1 | 3.00 | 0.0 | 0.0 | 0.0 | 0.00 | -0.40 | 1.0 | 0.7 | 0.3 | 0.68 | 0.74 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | Big-5-European-Leagues | Big5 | 2021/2022 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 |
8212 | eng ENG | DF | West Ham | Premier League | 30 | 1989.0 | 36 | 36 | 3170.0 | 35.2 | 0 | 8 | 0 | 0 | 0 | 3 | 0 | 0.00 | 0.23 | 0.23 | 0.00 | 0.23 | 0.9 | 0.9 | 5.9 | 6.9 | 0.03 | 0.17 | 0.19 | 0.03 | 0.19 | Matches | 19.0 | 4.0 | 21.1 | 0.54 | 0.11 | 0.00 | 0.00 | 24.6 | 11.0 | 0.05 | -0.9 | -0.9 | 1541.0 | 2061.0 | 74.8 | 30884.0 | 14764.0 | 655.0 | 736.0 | 89.0 | 616.0 | 736.0 | 83.7 | 252.0 | 518.0 | 48.6 | 2.1 | 60.0 | 149.0 | 24.0 | 17.0 | 120.0 | 1617.0 | 444.0 | 2.0 | 158.0 | 84.0 | 91.0 | 82.0 | 22.0 | 51.0 | 6.0 | 1104.0 | 365.0 | 592.0 | 1607.0 | 118.0 | 49.0 | 259.0 | 4.0 | 23.0 | 35.0 | 48.0 | 66.0 | 96.0 | 2.73 | 48.0 | 41.0 | 1.0 | 2.0 | 0.0 | 14.0 | 0.40 | 5.0 | 6.0 | 0.0 | 2.0 | 1.0 | 0.0 | 31.0 | 13.0 | 15.0 | 14.0 | 2.0 | 14.0 | 41.2 | 20.0 | 67.0 | 29.1 | 122.0 | 90.0 | 18.0 | 0.0 | 34.0 | 67.0 | 98.0 | 0.0 | 2307.0 | 182.0 | 18.0 | 40.0 | 8.0 | 2.0 | 1266.0 | 3.0 | 7.0 | 9.0 | 1366.0 | 1318.0 | 96.5 | 30.0 | 88 | 92.7 | 88.0 | NaN | 0 | NaN | 0 | 1.81 | 60.0 | 41.0 | 19.0 | 0.54 | 1.98 | 51.3 | 42.9 | 8.4 | 0.24 | 1.26 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 307.0 | 38.0 | 30.0 | 55.9 | Big-5-European-Leagues | Big5 | 2020/2021 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 |
2728 | eng ENG | DF | West Ham | Premier League | 28 | 1989.0 | 20 | 18 | 1589.0 | 17.7 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.00 | 0.06 | 0.06 | 0.00 | 0.06 | 0.5 | 0.5 | 0.9 | 1.4 | 0.03 | 0.05 | 0.08 | 0.03 | 0.08 | Matches | 11.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.00 | NaN | 23.5 | 2.0 | 0.04 | -0.5 | -0.5 | 842.0 | 1070.0 | 78.7 | 13627.0 | 5572.0 | 453.0 | 501.0 | 90.4 | 307.0 | 371.0 | 82.7 | 64.0 | 140.0 | 45.7 | 0.1 | 16.0 | 55.0 | 15.0 | 5.0 | 65.0 | 854.0 | 216.0 | 0.0 | 168.0 | 18.0 | 46.0 | 10.0 | 0.0 | 2.0 | 0.0 | 642.0 | 235.0 | 193.0 | 787.0 | 51.0 | 27.0 | 190.0 | 4.0 | 2.0 | 21.0 | 27.0 | 44.0 | 29.0 | 1.64 | 19.0 | 6.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.11 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 19.0 | 14.0 | 12.0 | 4.0 | 12.0 | 42.9 | 16.0 | 68.0 | 31.5 | 129.0 | 59.0 | 28.0 | 0.0 | 39.0 | 49.0 | 60.0 | 1.0 | 1266.0 | 78.0 | 36.0 | 63.6 | 7.0 | 1.0 | 723.0 | 8.0 | 11.0 | 13.0 | 797.0 | 715.0 | 89.7 | 43.0 | 79 | 46.5 | 85.0 | 16.0 | 2 | 30.0 | 7 | 1.30 | 21.0 | 26.0 | -5.0 | -0.28 | -0.38 | 20.1 | 25.3 | -5.3 | -0.30 | 0.12 | 0.0 | 2 | 0.0 | 0.0 | 0.0 | 169.0 | 22.0 | 14.0 | 61.1 | Big-5-European-Leagues | Big5 | 2018/2019 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 |
df_tm_merge_final.head()
tm_id | season | player_name | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | birth_year | position | position_code | position_grouped | outfielder_goalkeeper | height | foot | citizenship | second_citizenship | player_agent | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | 2017/2018 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 675000.0 | 750000 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 26 | 2018/2019 | roman weidenfeller | Borussia Dortmund | retired | L1 | 41.0 | 0.0 | 0 | 1980-08-06 | Diez | 1980.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 188.0 | left | Germany | NaN | Jörg Neubauer | roman weidenfeller | roman | weidenfeller | r | germany | 6.0 | 8.0 | Germany | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | roman weidenfeller | roman | weidenfeller | r | NaN | 41.0 | 37.0 | 3.0 | NaN | 0.0 | Borussia Dortmund | Retired | - | out | Summer | 0.0 | 1 Bundesliga |
2 | 80 | 2017/2018 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 | Bayern Munich | Career break | - | in | Summer | 0.0 | 1 Bundesliga |
4 | 80 | 2018/2019 | tom starke | Bayern Munich | retired | L1 | 40.0 | 90000.0 | 100000 | 1981-03-18 | Freital | 1981.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | 194.0 | right | Germany | NaN | IFM | tom starke | tom | starke | t | germany | 18.0 | 3.0 | DDR | NaN | 0.0 | 2018-07-01 | NaN | NaN | NaN | NaN | NaN | tom starke | tom | starke | t | NaN | 40.0 | 37.0 | 3.0 | NaN | 0.0 | Bayern Munich | Retired | - | out | Summer | 0.0 | 1 Bundesliga |
5 | 488 | 2017/2018 | gerhard tremmel | Swansea City | retired | GB1 | 42.0 | 225000.0 | 250000 | 1978-11-16 | München | 1978.0 | Goalkeeper | GK | Goalkeeper | Goalkeeper | NaN | NaN | Germany | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | germany | 16.0 | 11.0 | Germany | NaN | 0.0 | 2017-07-17 | NaN | NaN | NaN | NaN | NaN | gerhard tremmel | gerhard | tremmel | g | NaN | 42.0 | 38.0 | 4.0 | NaN | 0.0 | Swansea City | Retired | - | out | Summer | 0.0 | Premier League |
# Record Linkage Step 1 - Create an indexer object
indexer = recordlinkage.Index()
indexer.block(left_on = ['first_initial_lower', 'birth_year', 'outfielder_goalkeeper', 'season'],
right_on = ['first_initial_lower', 'birth_year', 'outfielder_goalkeeper', 'season']
)
<Index>
The next step is to build up all the potential candidates to check:
# Record Linkage Step 2 - Build up all the potential candidates to check:
candidates = indexer.index(df_fbref_merge, df_tm_merge_final)
print(len(candidates))
258184
Now that we have defined the left and right data sets and all the candidates, we can define how we want to perform the comparison logic using Compare()
:
# Record Linkage Step 3 - Define how we to perform the comparison logic
compare = recordlinkage.Compare()
compare.string('first_name_lower',
'first_name_lower',
method='levenshtein',
threshold=0.60,
label='first_name'
)
compare.string('last_name_lower',
'last_name_lower',
method='levenshtein',
threshold=0.60,
label='last_name'
)
compare.string('country_lower',
'country_lower',
method='levenshtein',
threshold=0.60,
label='country'
)
features = compare.compute(candidates, df_fbref_merge, df_tm_merge_final)
View the potential candidates
# Record Linkage Step 4 - view the potential candidates
features
first_name | last_name | country | ||
---|---|---|---|---|
5430 | 28740 | 0.0 | 0.0 | 0.0 |
28758 | 0.0 | 0.0 | 0.0 | |
29919 | 0.0 | 0.0 | 0.0 | |
29967 | 0.0 | 0.0 | 0.0 | |
30184 | 0.0 | 0.0 | 0.0 | |
... | ... | ... | ... | ... |
11066 | 5389 | 0.0 | 0.0 | 0.0 |
8204 | 149 | 0.0 | 0.0 | 0.0 |
5388 | 0.0 | 0.0 | 0.0 | |
5330 | 17228 | 1.0 | 1.0 | 1.0 |
8109 | 17229 | 1.0 | 1.0 | 1.0 |
258184 rows × 3 columns
This DataFrame shows the results of all of the comparisons. There is one row for each row in the Company House and Fan360 DataFrames. The columns correspond to the comparisons we defined. A 1 is a match and 0 is not.
Given the large number of records with no matches, it is a little hard to see how many matches we might have. We can sum up the individual scores to see about the quality of the matches.
# Sum up the individual scores to see the quality of the matches
features.sum(axis=1).value_counts().sort_index(ascending=False)
3.0 7358 2.0 7442 1.0 29599 0.0 213785 dtype: int64
To only include high-quality matches, let’s just take all the records with at least 2 matches out of 3 and create a total score column:
# Show records that have match by index number
potential_matches = features[features.sum(axis=1) >= 2].reset_index()
potential_matches
level_0 | level_1 | first_name | last_name | country | |
---|---|---|---|---|---|
0 | 5430 | 32278 | 1.0 | 1.0 | 1.0 |
1 | 5442 | 35813 | 1.0 | 1.0 | 1.0 |
2 | 5495 | 32743 | 1.0 | 1.0 | 0.0 |
3 | 5579 | 31981 | 1.0 | 1.0 | 0.0 |
4 | 5586 | 34594 | 1.0 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... |
14795 | 8167 | 11485 | 1.0 | 1.0 | 0.0 |
14796 | 2689 | 6333 | 1.0 | 1.0 | 1.0 |
14797 | 11046 | 36305 | 1.0 | 0.0 | 1.0 |
14798 | 5330 | 17228 | 1.0 | 1.0 | 1.0 |
14799 | 8109 | 17229 | 1.0 | 1.0 | 1.0 |
14800 rows × 5 columns
# Create 'Score' attribute, that sums the three columns defined in record-linkage
potential_matches['Score'] = potential_matches.loc[:, 'first_name': 'country'].sum(axis=1)
# Display DataFrame of potential matches, potential_matches
potential_matches
level_0 | level_1 | first_name | last_name | country | Score | |
---|---|---|---|---|---|---|
0 | 5430 | 32278 | 1.0 | 1.0 | 1.0 | 3.0 |
1 | 5442 | 35813 | 1.0 | 1.0 | 1.0 | 3.0 |
2 | 5495 | 32743 | 1.0 | 1.0 | 0.0 | 2.0 |
3 | 5579 | 31981 | 1.0 | 1.0 | 0.0 | 2.0 |
4 | 5586 | 34594 | 1.0 | 1.0 | 0.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... |
14795 | 8167 | 11485 | 1.0 | 1.0 | 0.0 | 2.0 |
14796 | 2689 | 6333 | 1.0 | 1.0 | 1.0 | 3.0 |
14797 | 11046 | 36305 | 1.0 | 0.0 | 1.0 | 2.0 |
14798 | 5330 | 17228 | 1.0 | 1.0 | 1.0 | 3.0 |
14799 | 8109 | 17229 | 1.0 | 1.0 | 1.0 | 3.0 |
14800 rows × 6 columns
# Select only the top match per per left index (FBref data)
## Order the potential matches by left index (FBref data) ascending and score decending
potential_matches = potential_matches.sort_values(by=['level_0', 'Score'], ascending=[True, False])
## Dedupe DataFrame, keeping only the top row
potential_matches = potential_matches.drop_duplicates(subset=['level_0'], keep='first')
# Display DataFrame
potential_matches.head()
level_0 | level_1 | first_name | last_name | country | Score | |
---|---|---|---|---|---|---|
117 | 0 | 8742 | 1.0 | 1.0 | 1.0 | 3.0 |
161 | 1 | 176 | 1.0 | 1.0 | 0.0 | 2.0 |
191 | 2 | 696 | 1.0 | 1.0 | 0.0 | 2.0 |
190 | 3 | 696 | 1.0 | 1.0 | 0.0 | 2.0 |
369 | 4 | 10802 | 1.0 | 1.0 | 0.0 | 2.0 |
# Shape of potential matches DataFrame
potential_matches.shape
(11761, 6)
The following code puts the two datasets back together, using the output record-linkage dataset - potential_matches
.
# Join Datasets
## Join Datasets
### Join the FBref Outfielder DataFrame to the potential matches DataFrame
#df_merge_fbref_tm = pd.merge(potential_matches, df_fbref_merge, left_on='level_0', right_index=True, how='left')
df_merge_fbref_tm = pd.merge(df_fbref_merge, potential_matches, left_index=True, right_on='level_0', how='left')
### Join the TransferMarkt Outfielder DataFrame to the potential matches DataFrame
df_merge_fbref_tm = pd.merge(df_merge_fbref_tm, df_tm_merge_final, left_on='level_1', right_index=True, how='left')
## Data cleanup
### Rename columns - required otherwise 'birth_year' gets dropped
df_merge_fbref_tm = df_merge_fbref_tm.rename(columns={'birth_year_x': 'born'})
### Sort columns
df_merge_fbref_tm = df_merge_fbref_tm.sort_values(by=['season_x', 'player_name_fbref', 'tm_id_x', 'Score'], ascending=[True, True, True, False])
### Remove duplicates
#### Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns
df_merge_fbref_tm = df_merge_fbref_tm[df_merge_fbref_tm.columns.drop(list(df_merge_fbref_tm.filter(regex='_y')))]
df_merge_fbref_tm.columns = df_merge_fbref_tm.columns.str.replace('_x','')
#### Remove duplicate rows
#df_merge_fbref_tm = df_merge_fbref_tm.drop_duplicates(subset=['player_name_fbref', 'season', 'Team Name', 'Comp'], keep='first')
df_merge_fbref_tm = df_merge_fbref_tm.drop_duplicates(subset=['player_name_fbref', 'season', 'Team Name', 'Comp'], keep='first')
### Rename columns
df_merge_fbref_tm = df_merge_fbref_tm.rename(columns={'born': 'birth_year',
'player_name': 'player_name_tm'
}
)
### Sort columns
df_merge_fbref_tm = df_merge_fbref_tm.sort_values(by=['player_name_fbref', 'season'], ascending=[True, True])
### Reset index
df_merge_fbref_tm = df_merge_fbref_tm.reset_index(drop=True)
## Determine columns to keep and remove
### Drop unnecessary columns
df_merge_fbref_tm = df_merge_fbref_tm.drop(['League Name', 'League ID', 'Score', 'level_0', 'level_1', 'first_name' , 'last_name', 'country'], axis=1)
### Define columns of interest
#### FBref players
lst_cols_fbref_players = ['player_name_fbref',
'fbref_id',
'url_fbref',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
'birth_year',
'country_lower',
'outfielder_goalkeeper',
'season'
]
#### TM Bio-Status, Valuations, and Transfers
lst_cols_tm = ['player_name_tm',
'url_tm',
'tm_id',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
'birth_year',
'country_lower',
'outfielder_goalkeeper',
]
### Combine all columns of interest into a single list
lst_fbref_tm_select = list(lst_cols_fbref_players)
lst_fbref_tm_select.extend(x for x in lst_cols_tm if x not in lst_fbref_tm_select)
### Determine columns not of interest as separate list
lst_fbref_tm_non_select = list(set(list(df_merge_fbref_tm.columns)) - set(lst_fbref_tm_select))
### Define all columns
lst_fbref_tm_all = list(df_merge_fbref_tm.columns)
## Select columns of interest
df_merge_fbref_tm_select = df_merge_fbref_tm[lst_fbref_tm_select]
print('No. rows in FBref Players DataFrame before join to merged TM data: {}'.format(len(df_fbref_merge)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_merge_fbref_tm_select)))
print('-'*10+'\n')
print('Variance in rows before and after join: {}\n'.format(len(df_merge_fbref_tm_select) - len(df_fbref_merge)))
No. rows in FBref Players DataFrame before join to merged TM data: 12753 No. rows in DataFrame AFTER join: 12753 ---------- Variance in rows before and after join: 0
# Display DataFrame
df_merge_fbref_tm.head()
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | player_name_tm | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | position | position_code | position_grouped | height | foot | citizenship | second_citizenship | player_agent | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ie IRL | FW | Brighton | Premier League | 19 | 2000.0 | 24 | 14 | 1258.0 | 14.0 | 3 | 1 | 3 | 0 | 0 | 0 | 0 | 0.21 | 0.07 | 0.29 | 0.21 | 0.29 | 3.2 | 3.2 | 0.3 | 3.5 | 0.23 | 0.02 | 0.25 | 0.23 | 0.25 | Matches | 38.0 | 13.0 | 34.2 | 2.72 | 0.93 | 0.08 | 0.23 | 15.9 | 0.0 | 0.08 | -0.2 | -0.2 | 126.0 | 163.0 | 77.3 | 1739.0 | 242.0 | 76.0 | 92.0 | 82.6 | 31.0 | 39.0 | 79.5 | 7.0 | 11.0 | 63.6 | 0.7 | 6.0 | 6.0 | 2.0 | 0.0 | 10.0 | 148.0 | 15.0 | 1.0 | 50.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 52.0 | 21.0 | 27.0 | 107.0 | 13.0 | 1.0 | 6.0 | 0.0 | 1.0 | 4.0 | 10.0 | 25.0 | 1.79 | 7.0 | 0.0 | 3.0 | 9.0 | 3.0 | 5.0 | 0.36 | 1.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.0 | 12.0 | 8.0 | 1.0 | 5.0 | 6.0 | 3.0 | 25.0 | 9.0 | 69.0 | 29.5 | 14.0 | 94.0 | 126.0 | 0.0 | 7.0 | 17.0 | 1.0 | 0.0 | 349.0 | 2.0 | 61.0 | 37.5 | 6.0 | 1.0 | 228.0 | 12.0 | 42.0 | 34.0 | 535.0 | 235.0 | 43.9 | 99.0 | 52 | 36.8 | 72.0 | 0.0 | 10 | 26.0 | 4 | 1.13 | 18.0 | 22.0 | -4.0 | -0.29 | 0.17 | 15.6 | 19.8 | -4.2 | -0.30 | 0.08 | 0.0 | 16 | 2.0 | 0.0 | 0.0 | 54.0 | 14.0 | 48.0 | 22.6 | 2019/2020 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 4050000.0 | 4500000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | Brighton & Hove Albion | Brighton U23 | - | in | Summer | 0.0 | Premier League |
1 | ie IRL | FW | Brighton | Premier League | 20 | 2000.0 | 17 | 9 | 791.0 | 8.8 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0.23 | 0.11 | 0.34 | 0.23 | 0.34 | 3.5 | 3.5 | 0.2 | 3.7 | 0.40 | 0.02 | 0.42 | 0.40 | 0.42 | Matches | 23.0 | 8.0 | 34.8 | 2.62 | 0.91 | 0.09 | 0.25 | 13.7 | 0.0 | 0.15 | -1.5 | -1.5 | 79.0 | 101.0 | 78.2 | 1147.0 | 165.0 | 45.0 | 56.0 | 80.4 | 26.0 | 30.0 | 86.7 | 4.0 | 5.0 | 80.0 | 0.8 | 5.0 | 2.0 | 1.0 | 0.0 | 3.0 | 91.0 | 10.0 | 0.0 | 22.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 64.0 | 26.0 | 11.0 | 11.0 | 74.0 | 5.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 4.0 | 12.0 | 1.37 | 7.0 | 0.0 | 3.0 | 2.0 | 0.0 | 1.0 | 0.11 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 5.0 | 2.0 | 4.0 | 1.0 | 1.0 | 20.0 | 4.0 | 40.0 | 32.3 | 7.0 | 58.0 | 59.0 | 0.0 | 8.0 | 7.0 | 1.0 | 0.0 | 201.0 | 1.0 | 38.0 | 80.0 | 8.0 | 0.0 | 124.0 | 4.0 | 29.0 | 15.0 | 357.0 | 143.0 | 40.1 | 64.0 | 47 | 23.1 | 68.0 | NaN | 8 | 23.0 | 11 | 0.88 | 12.0 | 17.0 | -5.0 | -0.57 | -0.53 | 13.8 | 7.8 | 6.0 | 0.69 | 0.42 | 0.0 | 5 | 1.0 | 0.0 | 0.0 | 28.0 | 11.0 | 30.0 | 26.8 | 2020/2021 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 6300000.0 | 7000000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | ie IRL | FW | Brighton | Premier League | 21 | 2000.0 | 1 | 0 | 45.0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.4 | 0.4 | 0.0 | 0.4 | 0.85 | 0.00 | 0.85 | 0.85 | 0.85 | Matches | 1.0 | 0.0 | 0.0 | 2.00 | 0.00 | 0.00 | NaN | 9.0 | 0.0 | 0.42 | -0.4 | -0.4 | 2.0 | 3.0 | 66.7 | 14.0 | 0.0 | 2.0 | 2.0 | 100.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 1.0 | 8.3 | 1.0 | 6.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 2.0 | NaN | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 20.0 | 4.0 | 20.0 | 2.0 | 45 | 16.7 | NaN | 0.0 | 1 | 45.0 | 1 | 3.00 | 0.0 | 0.0 | 0.0 | 0.00 | -0.40 | 1.0 | 0.7 | 0.3 | 0.68 | 0.74 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | 2021/2022 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 6300000.0 | 7000000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | eng ENG | DF | West Ham | Premier League | 27 | 1989.0 | 36 | 35 | 3069.0 | 34.1 | 1 | 3 | 1 | 0 | 0 | 7 | 0 | 0.03 | 0.09 | 0.12 | 0.03 | 0.12 | 0.8 | 0.8 | 2.8 | 3.6 | 0.02 | 0.08 | 0.10 | 0.02 | 0.10 | Matches | 21.0 | 6.0 | 28.6 | 0.62 | 0.18 | 0.05 | 0.17 | 28.1 | 8.0 | 0.04 | 0.2 | 0.2 | 1224.0 | 1708.0 | 71.7 | 23519.0 | 10212.0 | 560.0 | 623.0 | 89.9 | 472.0 | 587.0 | 80.4 | 183.0 | 449.0 | 40.8 | 0.2 | 35.0 | 117.0 | 21.0 | 14.0 | 96.0 | 1343.0 | 365.0 | 1.0 | 222.0 | 83.0 | 93.0 | 67.0 | 35.0 | 15.0 | 9.0 | 893.0 | 293.0 | 522.0 | 1329.0 | 78.0 | 59.0 | 210.0 | 5.0 | 15.0 | 44.0 | 39.0 | 52.0 | 62.0 | 1.82 | 35.0 | 21.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.26 | 6.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | 18.0 | 15.0 | 18.0 | 5.0 | 17.0 | 53.1 | 15.0 | 115.0 | 32.1 | 181.0 | 123.0 | 54.0 | 0.0 | 38.0 | 90.0 | 133.0 | 0.0 | 2050.0 | 125.0 | 17.0 | 33.3 | 7.0 | 0.0 | 1071.0 | 2.0 | 18.0 | 19.0 | 1171.0 | 1094.0 | 93.4 | 31.0 | 85 | 89.7 | NaN | 30.0 | 1 | NaN | 1 | 1.14 | 45.0 | 60.0 | -15.0 | -0.44 | 0.84 | 38.0 | 51.5 | -13.5 | -0.40 | 1.09 | 0.0 | 20 | 0.0 | 0.0 | 0.0 | 277.0 | 70.0 | 57.0 | 55.1 | 2017/2018 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 | aaron cresswell | West Ham United | west ham united | GB1 | 31.0 | 10800000.0 | 12000000.0 | 1989-12-15 | Liverpool | Defender - Left-Back | LB | Defender | 170.0 | left | England | NaN | Unique Sports Management | 15.0 | 12.0 | England | england | 5000000.0 | 2014-07-03 | 2023-06-30 | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | 31.0 | 24.0 | 7.0 | 1.0 | 4500000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | eng ENG | DF | West Ham | Premier League | 28 | 1989.0 | 20 | 18 | 1589.0 | 17.7 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.00 | 0.06 | 0.06 | 0.00 | 0.06 | 0.5 | 0.5 | 0.9 | 1.4 | 0.03 | 0.05 | 0.08 | 0.03 | 0.08 | Matches | 11.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.00 | NaN | 23.5 | 2.0 | 0.04 | -0.5 | -0.5 | 842.0 | 1070.0 | 78.7 | 13627.0 | 5572.0 | 453.0 | 501.0 | 90.4 | 307.0 | 371.0 | 82.7 | 64.0 | 140.0 | 45.7 | 0.1 | 16.0 | 55.0 | 15.0 | 5.0 | 65.0 | 854.0 | 216.0 | 0.0 | 168.0 | 18.0 | 46.0 | 10.0 | 0.0 | 2.0 | 0.0 | 642.0 | 235.0 | 193.0 | 787.0 | 51.0 | 27.0 | 190.0 | 4.0 | 2.0 | 21.0 | 27.0 | 44.0 | 29.0 | 1.64 | 19.0 | 6.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.11 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 19.0 | 14.0 | 12.0 | 4.0 | 12.0 | 42.9 | 16.0 | 68.0 | 31.5 | 129.0 | 59.0 | 28.0 | 0.0 | 39.0 | 49.0 | 60.0 | 1.0 | 1266.0 | 78.0 | 36.0 | 63.6 | 7.0 | 1.0 | 723.0 | 8.0 | 11.0 | 13.0 | 797.0 | 715.0 | 89.7 | 43.0 | 79 | 46.5 | 85.0 | 16.0 | 2 | 30.0 | 7 | 1.30 | 21.0 | 26.0 | -5.0 | -0.28 | -0.38 | 20.1 | 25.3 | -5.3 | -0.30 | 0.12 | 0.0 | 2 | 0.0 | 0.0 | 0.0 | 169.0 | 22.0 | 14.0 | 61.1 | 2018/2019 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 | aaron cresswell | West Ham United | west ham united | GB1 | 31.0 | 9000000.0 | 10000000.0 | 1989-12-15 | Liverpool | Defender - Left-Back | LB | Defender | 170.0 | left | England | NaN | Unique Sports Management | 15.0 | 12.0 | England | england | 5000000.0 | 2014-07-03 | 2023-06-30 | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | 31.0 | 24.0 | 7.0 | 1.0 | 4500000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# Display DataFrame
df_merge_fbref_tm_select.head(10)
player_name_fbref | fbref_id | url_fbref | first_initial_lower | first_name_lower | last_name_lower | birth_year | country_lower | outfielder_goalkeeper | season | player_name_tm | url_tm | tm_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | a | aaron | connolly | 2000.0 | ireland | Outfielder | 2019/2020 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
1 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | a | aaron | connolly | 2000.0 | ireland | Outfielder | 2020/2021 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
2 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | a | aaron | connolly | 2000.0 | ireland | Outfielder | 2021/2022 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
3 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | a | aaron | cresswell | 1989.0 | england | Outfielder | 2017/2018 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
4 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | a | aaron | cresswell | 1989.0 | england | Outfielder | 2018/2019 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
5 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | a | aaron | cresswell | 1989.0 | england | Outfielder | 2019/2020 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
6 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | a | aaron | cresswell | 1989.0 | england | Outfielder | 2020/2021 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
7 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | a | aaron | cresswell | 1989.0 | england | Outfielder | 2021/2022 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
8 | Aaron Hickey | 1780bb4a | https://fbref.com/en/players/1780bb4a/Aaron-Hi... | a | aaron | hickey | 2002.0 | scotland | Outfielder | 2020/2021 | aaron hickey | https://www.transfermarkt.com/aaron-hickey/pro... | 591949 |
9 | Aaron Hickey | 1780bb4a | https://fbref.com/en/players/1780bb4a/Aaron-Hi... | a | aaron | hickey | 2002.0 | scotland | Outfielder | 2021/2022 | aaron hickey | https://www.transfermarkt.com/aaron-hickey/pro... | 591949 |
lst_fbref_tm_select
['player_name_fbref', 'fbref_id', 'url_fbref', 'first_initial_lower', 'first_name_lower', 'last_name_lower', 'birth_year', 'country_lower', 'outfielder_goalkeeper', 'season', 'player_name_tm', 'url_tm', 'tm_id']
lst_fbref_tm_non_select
['Def 3rd.1', 'PK', 'PKwon', 'joined', 'league_code', 'current_club', 'Thr', '90s', 'Carries', 'Left', 'Lost', 'Err', 'npxG+xA.1', 'age_when_joining', 'Cmp', 'citizenship', 'Stp', 'Low', 'onxGA', 'High', 'G-PK.1', 'years_until_contract_expiry', 'KP', 'Dead', 'Out', 'Nationality Code', 'G/SoT', 'AvgDist', 'Live', 'GA90', 'league_country_lower', 'Drib.1', 'Touches', 'Sw', 'Age', 'TotDist', 'Ast', 'On-Off', 'foot', 'ShSv', 'current_club_country', 'Mis', 'Starts', 'second_citizenship', 'Fld.1', 'Mn/Start', 'onxG', 'Cmp.1', 'Att.2', 'transfer_period', 'Gls', 'G-PK', '+/-', 'Att 3rd', 'OG', 'TmPos', 'In', 'G/Sh', 'SoTA', 'npxG+xA', 'Press', 'age', 'Other', 'club', 'PassDead.1', 'Launch%.1', 'Att.3', 'On-Off.1', 'CrdR', 'xG+xA', 'FK', 'PSxG', 'name_lower', 'PassLive.1', 'Targ', 'onG', 'Mid 3rd.1', 'Launch%', 'transfer_movement', 'Tkl.1', 'market_value_gbp', 'unSub', 'CPA', 'MP', 'Dis', '#OPA/90', 'Att Pen', 'Recov', 'np:G-xG', 'Att', 'contract_expires', 'contract_option', 'loan_contract_expiry', 'league_name', 'firstname_lower', 'Dist', 'player_name_lower', 'lastname_lower', 'Nation', 'Won%', '1/3', 'PKm', 'Mid 3rd', 'xG+/-90', 'CK', 'L', 'G+A-PK', 'CS', 'CS%', 'SoT%', 'Mn/Sub', '+/-90', 'market_value_eur', 'Subs', 'Position Grouped', 'PSxG+/-', 'Prog', 'Ground', 'on_loan_from_country', 'xG+/-', 'PKcon', 'Succ', 'TB', 'SCA', 'fee', 'on_loan_from', 'market_value_pounds', 'Team Name', 'market_value_euros', 'Def', 'Primary Pos', 'Clr', 'club_involved_name', 'PSxG/SoT', 'Drib', 'player_agent', 'CrdY', 'Cmp.3', 'Last Name Lower', 'pob', 'Att.1', 'Out.1', 'PassLive', 'Succ%', 'AvgLen.1', 'GCA90', 'xG', 'club_name', 'Fld', 'SCA90', 'dob', 'birth_month', 'Tkl%', 'Sh', 'Team Country', 'npxG.1', 'Pos', 'Cmp%', 'Save%.1', 'Save%', 'Comp', 'Nationality Cleaned', 'First Initial Lower', 'Cmp.2', 'Matches', 'Opp', 'Ast.1', 'xA', 'PPM', 'Min%', 'G-xG', 'Past', '%', 'Def Pen', 'Off', 'birth_day', 'Sh/90', 'firstinitial_lower', '/90', 'Head', '2CrdY', 'PKA', '#Pl', 'position_grouped', 'SoT', 'TI', 'Blocks', 'SoT/90', 'npxG', 'First Name Lower', 'GCA', 'years_since_joining', 'cob', 'Mn/MP', 'Def.1', 'position_code', 'Prog.1', 'PKatt', 'AvgLen', 'G+A', 'PKsv', 'Pass', 'npxG/Sh', 'A-xA', 'Crs', 'Gls.1', 'PassDead', '#OPA', 'D', 'PrgDist', 'Stp%', 'Squad', 'Rec%', 'fee_cleaned', 'W', 'Player Lower', 'Megs', 'xG.1', 'Str', 'Tkl+Int', 'Cmp%.2', 'GA', 'Saves', 'Rec', 'Sh.1', 'Fls', 'Team Country Lower', 'Cmp%.3', 'Tkl', 'Int', 'Att 3rd.1', 'Def 3rd', 'Min', 'Cmp%.1', 'position', 'TklW', 'Compl', 'height', 'xA.1', 'current_age', 'onGA', 'Won', 'CrsPA', 'PPA', 'Right']
df_merge_fbref_tm[df_merge_fbref_tm['player_name_fbref'].str.contains('Gerard Piqu', na=False)]
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | player_name_tm | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | position | position_code | position_grouped | height | foot | citizenship | second_citizenship | player_agent | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4097 | es ESP | DF | Barcelona | La Liga | 30 | 1987.0 | 30 | 29 | 2631.0 | 29.2 | 2 | 0 | 2 | 0 | 0 | 8 | 0 | 0.07 | 0.00 | 0.07 | 0.07 | 0.07 | 3.0 | 3.0 | 0.9 | 3.8 | 0.10 | 0.03 | 0.13 | 0.10 | 0.13 | Matches | 18.0 | 5.0 | 27.8 | 0.62 | 0.17 | 0.11 | 0.40 | 9.2 | 0.0 | 0.16 | -1.0 | -1.0 | 1606.0 | 1810.0 | 88.7 | 34428.0 | 10291.0 | 460.0 | 499.0 | 92.2 | 841.0 | 904.0 | 93.0 | 293.0 | 380.0 | 77.1 | -0.9 | 6.0 | 123.0 | 4.0 | 0.0 | 90.0 | 1764.0 | 46.0 | 0.0 | 263.0 | 70.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1436.0 | 120.0 | 254.0 | 95.0 | 1552.0 | 106.0 | 10.0 | 8.0 | 1.0 | 28.0 | 19.0 | 11.0 | 17.0 | 0.58 | 13.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 30.0 | 22.0 | 22.0 | 8.0 | 0.0 | 12.0 | 34.3 | 23.0 | 78.0 | 32.5 | 144.0 | 80.0 | 16.0 | 2.0 | 26.0 | 62.0 | 103.0 | 5.0 | 2075.0 | 318.0 | 28.0 | 60.0 | 6.0 | 0.0 | 1419.0 | 0.0 | 4.0 | 7.0 | 1385.0 | 1341.0 | 96.8 | 27.0 | 88 | 76.9 | NaN | 27.0 | 1 | NaN | 6 | 2.57 | 81.0 | 23.0 | 58.0 | 1.98 | 0.62 | 65.0 | 30.0 | 35.1 | 1.20 | 0.96 | 0.0 | 23 | 1.0 | 0.0 | 0.0 | 353.0 | 54.0 | 19.0 | 74.0 | 2017/2018 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4098 | es ESP | DF | Barcelona | La Liga | 31 | 1987.0 | 35 | 35 | 3150.0 | 35.0 | 4 | 2 | 4 | 0 | 0 | 6 | 0 | 0.11 | 0.06 | 0.17 | 0.11 | 0.17 | 3.7 | 3.7 | 1.5 | 5.2 | 0.11 | 0.04 | 0.15 | 0.11 | 0.15 | Matches | 20.0 | 11.0 | 55.0 | 0.57 | 0.31 | 0.20 | 0.36 | 7.3 | 0.0 | 0.19 | 0.3 | 0.3 | 2230.0 | 2429.0 | 91.8 | 46352.0 | 14931.0 | 666.0 | 710.0 | 93.8 | 1209.0 | 1277.0 | 94.7 | 343.0 | 420.0 | 81.7 | 0.5 | 8.0 | 156.0 | 5.0 | 0.0 | 103.0 | 2347.0 | 82.0 | 3.0 | 296.0 | 58.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1937.0 | 190.0 | 302.0 | 97.0 | 2125.0 | 126.0 | 12.0 | 13.0 | 5.0 | 22.0 | 14.0 | 13.0 | 25.0 | 0.71 | 17.0 | 0.0 | 2.0 | 1.0 | 1.0 | 6.0 | 0.17 | 2.0 | 0.0 | 1.0 | 2.0 | 1.0 | 0.0 | 45.0 | 27.0 | 29.0 | 15.0 | 1.0 | 20.0 | 54.1 | 17.0 | 87.0 | 31.9 | 160.0 | 101.0 | 12.0 | 2.0 | 37.0 | 77.0 | 156.0 | 2.0 | 2760.0 | 385.0 | 32.0 | 76.2 | 16.0 | 0.0 | 1915.0 | 1.0 | 11.0 | 9.0 | 1963.0 | 1889.0 | 96.2 | 27.0 | 90 | 92.1 | 90.0 | 35.0 | 0 | NaN | 1 | 2.43 | 86.0 | 30.0 | 56.0 | 1.60 | 2.27 | 69.9 | 35.6 | 34.2 | 0.98 | 1.29 | 0.0 | 24 | 1.0 | 0.0 | 0.0 | 473.0 | 91.0 | 33.0 | 73.4 | 2018/2019 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4099 | es ESP | DF | Barcelona | La Liga | 32 | 1987.0 | 35 | 35 | 3092.0 | 34.4 | 1 | 0 | 1 | 0 | 0 | 15 | 0 | 0.03 | 0.00 | 0.03 | 0.03 | 0.03 | 2.3 | 2.3 | 0.6 | 2.9 | 0.07 | 0.02 | 0.08 | 0.07 | 0.08 | Matches | 15.0 | 6.0 | 40.0 | 0.44 | 0.17 | 0.07 | 0.17 | 8.4 | 1.0 | 0.15 | -1.3 | -1.3 | 2469.0 | 2659.0 | 92.9 | 53752.0 | 14795.0 | 645.0 | 682.0 | 94.6 | 1381.0 | 1437.0 | 96.1 | 427.0 | 506.0 | 84.4 | -0.6 | 5.0 | 192.0 | 3.0 | 0.0 | 116.0 | 2548.0 | 111.0 | 2.0 | 275.0 | 79.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2121.0 | 227.0 | 311.0 | 113.0 | 2364.0 | 99.0 | 9.0 | 13.0 | 1.0 | 25.0 | 14.0 | 14.0 | 12.0 | 0.35 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.03 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 37.0 | 22.0 | 21.0 | 16.0 | 0.0 | 14.0 | 48.3 | 15.0 | 83.0 | 34.0 | 128.0 | 109.0 | 7.0 | 2.0 | 40.0 | 73.0 | 182.0 | 1.0 | 2996.0 | 427.0 | 28.0 | 100.0 | 9.0 | 0.0 | 2084.0 | 0.0 | 9.0 | 7.0 | 2211.0 | 2171.0 | 98.2 | 21.0 | 88 | 90.4 | 88.0 | 31.0 | 0 | NaN | 0 | 2.09 | 71.0 | 36.0 | 35.0 | 1.02 | -2.55 | 56.2 | 33.8 | 22.4 | 0.65 | -1.49 | 0.0 | 32 | 0.0 | 2.0 | 0.0 | 391.0 | 128.0 | 40.0 | 76.2 | 2019/2020 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 22500000.0 | 25000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4100 | es ESP | DF | Barcelona | La Liga | 33 | 1987.0 | 18 | 18 | 1481.0 | 16.5 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.6 | 0.6 | 0.5 | 1.1 | 0.04 | 0.03 | 0.07 | 0.04 | 0.07 | Matches | 8.0 | 2.0 | 25.0 | 0.49 | 0.12 | 0.00 | 0.00 | 9.3 | 0.0 | 0.08 | -0.6 | -0.6 | 1173.0 | 1247.0 | 94.1 | 24743.0 | 6069.0 | 321.0 | 340.0 | 94.4 | 666.0 | 690.0 | 96.5 | 173.0 | 197.0 | 87.8 | -0.5 | 2.0 | 82.0 | 1.0 | 0.0 | 35.0 | 1204.0 | 43.0 | 0.0 | 91.0 | 32.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1033.0 | 93.0 | 121.0 | 72.0 | 1079.0 | 45.0 | 5.0 | 10.0 | 1.0 | 14.0 | 6.0 | 4.0 | 5.0 | 0.30 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.06 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 21.0 | 13.0 | 13.0 | 7.0 | 1.0 | 9.0 | 64.3 | 5.0 | 46.0 | 40.4 | 51.0 | 62.0 | 1.0 | 0.0 | 13.0 | 36.0 | 65.0 | 0.0 | 1368.0 | 161.0 | 14.0 | 100.0 | 1.0 | 0.0 | 961.0 | 0.0 | 2.0 | 3.0 | 1046.0 | 1015.0 | 97.0 | 7.0 | 82 | 43.3 | 82.0 | 13.0 | 0 | NaN | 1 | 1.53 | 34.0 | 22.0 | 12.0 | 0.73 | -0.90 | 32.9 | 16.1 | 16.8 | 1.02 | -0.02 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 182.0 | 66.0 | 21.0 | 75.9 | 2020/2021 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 13500000.0 | 15000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4101 | es ESP | DF | Barcelona | La Liga | 34 | 1987.0 | 2 | 2 | 120.0 | 1.3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0.75 | 0.00 | 0.75 | 0.75 | 0.75 | 0.2 | 0.2 | 0.0 | 0.2 | 0.15 | 0.00 | 0.15 | 0.15 | 0.15 | Matches | 1.0 | 1.0 | 100.0 | 0.75 | 0.75 | 1.00 | 1.00 | 8.0 | 0.0 | 0.19 | 0.8 | 0.8 | 94.0 | 97.0 | 96.9 | 2030.0 | 415.0 | 26.0 | 28.0 | 92.9 | 51.0 | 51.0 | 100.0 | 17.0 | 18.0 | 94.4 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 89.0 | 8.0 | 0.0 | 9.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | 5.0 | 7.0 | 11.0 | 80.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.75 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | NaN | 0.0 | 2.0 | 25.0 | 4.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 10.0 | 0.0 | 112.0 | 24.0 | 1.0 | NaN | 0.0 | 0.0 | 69.0 | 0.0 | 0.0 | 0.0 | 82.0 | 79.0 | 96.3 | 0.0 | 60 | 44.4 | 60.0 | 1.0 | 0 | NaN | 0 | 2.00 | 4.0 | 2.0 | 2.0 | 1.50 | 0.90 | 3.3 | 1.5 | 1.8 | 1.35 | 1.00 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 9.0 | 5.0 | 1.0 | 83.3 | 2021/2022 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 9000000.0 | 10000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# Record Linkage Step 1 - Create an indexer object
indexer = recordlinkage.Index()
indexer.block(left_on = ['first_initial_lower', 'outfielder_goalkeeper', 'season'],
right_on = ['first_initial_lower', 'outfielder_goalkeeper', 'season']
)
<Index>
# Record Linkage Step 2 - Build up all the potential candidates to check:
candidates = indexer.index(df_merge_fbref_tm, df_capology)
print(len(candidates))
932588
# Record Linkage Step 3 - Define how we to perform the comparison logic
compare = recordlinkage.Compare()
compare.string('first_name_lower',
'first_name_lower',
method='levenshtein',
threshold=0.60,
label='first_name'
)
compare.string('last_name_lower',
'last_name_lower',
method='levenshtein',
threshold=0.60,
label='last_name'
)
compare.string('country_lower',
'country_lower',
method='levenshtein',
threshold=0.60,
label='country'
)
features = compare.compute(candidates, df_merge_fbref_tm, df_capology)
# Record Linkage Step 4 - view the potential candidates
features
first_name | last_name | country | ||
---|---|---|---|---|
0 | 1878 | 0.0 | 0.0 | 0.0 |
1964 | 0.0 | 0.0 | 0.0 | |
1997 | 0.0 | 0.0 | 0.0 | |
1998 | 0.0 | 0.0 | 0.0 | |
2032 | 0.0 | 0.0 | 0.0 | |
... | ... | ... | ... | ... |
12518 | 15803 | 0.0 | 0.0 | 0.0 |
12523 | 15803 | 0.0 | 0.0 | 0.0 |
12526 | 15803 | 0.0 | 0.0 | 0.0 |
12530 | 15803 | 0.0 | 0.0 | 0.0 |
12533 | 15803 | 0.0 | 0.0 | 0.0 |
932588 rows × 3 columns
# Sum up the individual scores to see the quality of the matches.
features.sum(axis=1).value_counts().sort_index(ascending=False)
3.0 5025 2.0 21484 1.0 157533 0.0 748546 dtype: int64
# Show records that have match by index number
potential_matches = features[features.sum(axis=1) >= 2].reset_index()
potential_matches
level_0 | level_1 | first_name | last_name | country | |
---|---|---|---|---|---|
0 | 5 | 14909 | 1.0 | 0.0 | 1.0 |
1 | 5 | 15090 | 1.0 | 0.0 | 1.0 |
2 | 5 | 15294 | 1.0 | 1.0 | 1.0 |
3 | 14 | 14909 | 1.0 | 1.0 | 1.0 |
4 | 14 | 15090 | 1.0 | 0.0 | 1.0 |
... | ... | ... | ... | ... | ... |
26504 | 12528 | 6473 | 1.0 | 1.0 | 1.0 |
26505 | 12528 | 17924 | 1.0 | 1.0 | 1.0 |
26506 | 12516 | 6620 | 1.0 | 1.0 | 1.0 |
26507 | 12517 | 7787 | 1.0 | 1.0 | 1.0 |
26508 | 12529 | 7321 | 1.0 | 1.0 | 1.0 |
26509 rows × 5 columns
# Create 'Score' attribute, that sums the three columns defined in record-linkage
potential_matches['Score'] = potential_matches.loc[:, 'first_name': 'country'].sum(axis=1)
# Display DataFrame of potential matches, potential_matches
potential_matches
level_0 | level_1 | first_name | last_name | country | Score | |
---|---|---|---|---|---|---|
0 | 5 | 14909 | 1.0 | 0.0 | 1.0 | 2.0 |
1 | 5 | 15090 | 1.0 | 0.0 | 1.0 | 2.0 |
2 | 5 | 15294 | 1.0 | 1.0 | 1.0 | 3.0 |
3 | 14 | 14909 | 1.0 | 1.0 | 1.0 | 3.0 |
4 | 14 | 15090 | 1.0 | 0.0 | 1.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... |
26504 | 12528 | 6473 | 1.0 | 1.0 | 1.0 | 3.0 |
26505 | 12528 | 17924 | 1.0 | 1.0 | 1.0 | 3.0 |
26506 | 12516 | 6620 | 1.0 | 1.0 | 1.0 | 3.0 |
26507 | 12517 | 7787 | 1.0 | 1.0 | 1.0 | 3.0 |
26508 | 12529 | 7321 | 1.0 | 1.0 | 1.0 | 3.0 |
26509 rows × 6 columns
df_merge_fbref_tm[df_merge_fbref_tm.index == 2929]
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | player_name_tm | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | position | position_code | position_grouped | height | foot | citizenship | second_citizenship | player_agent | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2929 | it ITA | DF | Genoa | Serie A | 33 | 1986.0 | 23 | 20 | 1726.0 | 19.2 | 1 | 0 | 0 | 1 | 1 | 6 | 0 | 0.05 | 0.0 | 0.05 | 0.0 | 0.0 | 0.9 | 0.1 | 0.4 | 0.5 | 0.05 | 0.02 | 0.07 | 0.01 | 0.03 | Matches | 5.0 | 1.0 | 20.0 | 0.26 | 0.05 | 0.0 | 0.0 | 28.0 | 0.0 | 0.03 | 0.1 | -0.1 | 857.0 | 1052.0 | 81.5 | 18219.0 | 6711.0 | 277.0 | 309.0 | 89.6 | 434.0 | 487.0 | 89.1 | 140.0 | 241.0 | 58.1 | -0.4 | 6.0 | 75.0 | 9.0 | 1.0 | 80.0 | 972.0 | 80.0 | 0.0 | 85.0 | 29.0 | 18.0 | 2.0 | 0.0 | 1.0 | 1.0 | 701.0 | 118.0 | 233.0 | 795.0 | 151.0 | 56.0 | 33.0 | 5.0 | 9.0 | 14.0 | 20.0 | 12.0 | 14.0 | 0.73 | 13.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 43.0 | 27.0 | 31.0 | 9.0 | 3.0 | 10.0 | 35.7 | 18.0 | 71.0 | 29.8 | 131.0 | 94.0 | 13.0 | 0.0 | 13.0 | 78.0 | 77.0 | 1.0 | 1248.0 | 142.0 | 7.0 | 44.4 | 5.0 | 0.0 | 750.0 | 0.0 | 6.0 | 6.0 | 800.0 | 783.0 | 97.9 | 4.0 | 75 | 50.5 | 83.0 | 17.0 | 3 | 19.0 | 2 | 1.43 | 26.0 | 22.0 | 4.0 | 0.21 | 1.01 | 19.2 | 23.2 | -4.1 | -0.21 | 0.46 | 0.0 | 23 | 0.0 | 1.0 | 0.0 | 191.0 | 22.0 | 28.0 | 44.0 | 2020/2021 | Genoa | Italy | domenico criscito | domenico | criscito | d | italy | ITA | Italy | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | domenico criscito | domenico | criscito | d | italy | Domenico Criscito | https://fbref.com/en/players/35dd6d80/Domenico... | https://www.transfermarkt.com/domenico-criscit... | Left-Back | 44136 | 35dd6d80 | domenico criscito | Genoa CFC | genoa cfc | IT1 | 34.0 | 1800000.0 | 2000000.0 | 1986-12-30 | Cercola | Defender - Centre-Back | CB | Defender | 183.0 | left | Italy | NaN | PDP s.r.l. Pasqualin D’Amico Partners | 30.0 | 12.0 | Italy | italy | 1500000.0 | 2018-07-01 | 2023-06-30 | NaN | NaN | NaN | NaN | domenico criscito | domenico | criscito | d | italy | 34.0 | 31.0 | 3.0 | 1.0 | 1350000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df_capology[df_capology.index == 6473]
player | season | league | team | position | outfielder_goalkeeper | age | country | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6473 | Zinédine Machach | 2017/2018 | Ligue 1 | Toulouse | Midfielder | Outfielder | 21 | France | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | zinedine machach | zinedine | machach | z | france |
# Select only the top match per per left index (FBref-TM merged data)
## Order the potential matches by left index (FBref-TM merged data) ascending and score decending
potential_matches = potential_matches.sort_values(by=['level_0', 'Score'], ascending=[True, False])
## Dedupe DataFrame, keeping only the top row
potential_matches = potential_matches.drop_duplicates(subset=['level_0'], keep='first')
# Display DataFrame
potential_matches.head()
level_0 | level_1 | first_name | last_name | country | Score | |
---|---|---|---|---|---|---|
1363 | 3 | 13982 | 1.0 | 1.0 | 1.0 | 3.0 |
2395 | 4 | 14715 | 1.0 | 1.0 | 1.0 | 3.0 |
2 | 5 | 15294 | 1.0 | 1.0 | 1.0 | 3.0 |
1019 | 6 | 15894 | 1.0 | 1.0 | 1.0 | 3.0 |
1364 | 10 | 848 | 1.0 | 1.0 | 1.0 | 3.0 |
# Shape of potential matches DataFrame
potential_matches.shape
(5218, 6)
# Join Datasets
## Join Datasets
### Join the FBref Outfielder DataFrame to the potential matches DataFrame
#df_merge_fbref_tm_capology = pd.merge(potential_matches, df_merge_fbref_tm, left_on='level_0', right_index=True, how='left')
df_merge_fbref_tm_capology = pd.merge(df_merge_fbref_tm, potential_matches, left_index=True, right_on='level_0', how='left')
### Join the TransferMarkt Outfielder DataFrame to the potential matches DataFrame
df_merge_fbref_tm_capology = pd.merge(df_merge_fbref_tm_capology, df_capology, left_on='level_1', right_index=True, how='left')
## Data cleanup
### Rename columns - required otherwise 'birth_year' gets dropped
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.rename(columns={'birth_year': 'born'})
### Sort columns
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.sort_values(by=['season_x', 'player_name_fbref', 'tm_id', 'Score'], ascending=[True, True, True, False])
### Remove duplicates
#### Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology[df_merge_fbref_tm_capology.columns.drop(list(df_merge_fbref_tm_capology.filter(regex='_y')))]
df_merge_fbref_tm_capology.columns = df_merge_fbref_tm_capology.columns.str.replace('_x', '')
#### Remove duplicate rows
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.drop_duplicates(subset=['player_name_fbref', 'tm_id', 'fbref_id', 'season', 'Team Name', 'Comp'], keep='first')
### Rename columns
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.rename(columns={'born': 'birth_year',
'player': 'player_name_capology'
}
)
### Sort columns
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.sort_values(by=['player_name_fbref', 'season'], ascending=[True, True])
### Reset index
df_merge_fbref_tm = df_merge_fbref_tm.reset_index(drop=True)
## Determine columns to keep and remove
### Drop unnecessary columns
df_merge_fbref_tm_capology = df_merge_fbref_tm_capology.drop(['Score', 'level_0', 'level_1', 'first_name' , 'last_name', 'country'], axis=1)
### Capology
lst_cols_capology = ['player_name_capology',
'first_initial_lower',
'first_name_lower',
'last_name_lower',
'country_lower',
'outfielder_goalkeeper',
'season'
]
### Combine all columns of interest into a single list
lst_fbref_tm_capology_select = list(lst_cols_capology)
lst_fbref_tm_capology_select.extend(x for x in lst_fbref_tm_select if x not in lst_fbref_tm_capology_select)
### Determine columns not of interest as separate list
lst_fbref_tm_capology_non_select = list(set(list(df_merge_fbref_tm_capology.columns)) - set(lst_fbref_tm_capology_select))
### Define all columns
lst_fbref_tm_capology_all = list(df_merge_fbref_tm_capology.columns)
## Select columns of interest
df_merge_fbref_tm_capology_select = df_merge_fbref_tm_capology[lst_fbref_tm_capology_select]
print('No. rows in FBref-TM DataFrame before join to Capology data: {}'.format(len(df_merge_fbref_tm_select)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_merge_fbref_tm_capology_select)))
print('-'*10+'\n')
print('Variance in rows before and after join: {}\n'.format(len(df_merge_fbref_tm_capology_select) - len(df_merge_fbref_tm_select)))
No. rows in FBref-TM DataFrame before join to Capology data: 12753 No. rows in DataFrame AFTER join: 12753 ---------- Variance in rows before and after join: 0
# Display DataFrame
df_merge_fbref_tm_capology_select.head()
player_name_capology | first_initial_lower | first_name_lower | last_name_lower | country_lower | outfielder_goalkeeper | season | player_name_fbref | fbref_id | url_fbref | birth_year | player_name_tm | url_tm | tm_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | NaN | a | aaron | connolly | ireland | Outfielder | 2019/2020 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | 2000.0 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
NaN | NaN | a | aaron | connolly | ireland | Outfielder | 2020/2021 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | 2000.0 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
NaN | NaN | a | aaron | connolly | ireland | Outfielder | 2021/2022 | Aaron Connolly | 27c01749 | https://fbref.com/en/players/27c01749/Aaron-Co... | 2000.0 | aaron connolly | https://www.transfermarkt.com/aaron-connolly/p... | 434207 |
1363.0 | Aaron Cresswell | a | aaron | cresswell | england | Outfielder | 2017/2018 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | 1989.0 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
2395.0 | Aaron Cresswell | a | aaron | cresswell | england | Outfielder | 2018/2019 | Aaron Cresswell | 4f974391 | https://fbref.com/en/players/4f974391/Aaron-Cr... | 1989.0 | aaron cresswell | https://www.transfermarkt.com/aaron-cresswell/... | 92571 |
# Display DataFrame
df_merge_fbref_tm_capology.head()
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | player_name_tm | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | position | position_code | position_grouped | height | foot | citizenship | second_citizenship | player_agent | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | player_name_capology | league | team | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | ie IRL | FW | Brighton | Premier League | 19 | 2000.0 | 24 | 14 | 1258.0 | 14.0 | 3 | 1 | 3 | 0 | 0 | 0 | 0 | 0.21 | 0.07 | 0.29 | 0.21 | 0.29 | 3.2 | 3.2 | 0.3 | 3.5 | 0.23 | 0.02 | 0.25 | 0.23 | 0.25 | Matches | 38.0 | 13.0 | 34.2 | 2.72 | 0.93 | 0.08 | 0.23 | 15.9 | 0.0 | 0.08 | -0.2 | -0.2 | 126.0 | 163.0 | 77.3 | 1739.0 | 242.0 | 76.0 | 92.0 | 82.6 | 31.0 | 39.0 | 79.5 | 7.0 | 11.0 | 63.6 | 0.7 | 6.0 | 6.0 | 2.0 | 0.0 | 10.0 | 148.0 | 15.0 | 1.0 | 50.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 52.0 | 21.0 | 27.0 | 107.0 | 13.0 | 1.0 | 6.0 | 0.0 | 1.0 | 4.0 | 10.0 | 25.0 | 1.79 | 7.0 | 0.0 | 3.0 | 9.0 | 3.0 | 5.0 | 0.36 | 1.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.0 | 12.0 | 8.0 | 1.0 | 5.0 | 6.0 | 3.0 | 25.0 | 9.0 | 69.0 | 29.5 | 14.0 | 94.0 | 126.0 | 0.0 | 7.0 | 17.0 | 1.0 | 0.0 | 349.0 | 2.0 | 61.0 | 37.5 | 6.0 | 1.0 | 228.0 | 12.0 | 42.0 | 34.0 | 535.0 | 235.0 | 43.9 | 99.0 | 52 | 36.8 | 72.0 | 0.0 | 10 | 26.0 | 4 | 1.13 | 18.0 | 22.0 | -4.0 | -0.29 | 0.17 | 15.6 | 19.8 | -4.2 | -0.30 | 0.08 | 0.0 | 16 | 2.0 | 0.0 | 0.0 | 54.0 | 14.0 | 48.0 | 22.6 | 2019/2020 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 4050000.0 | 4500000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | Brighton & Hove Albion | Brighton U23 | - | in | Summer | 0.0 | Premier League | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
NaN | ie IRL | FW | Brighton | Premier League | 20 | 2000.0 | 17 | 9 | 791.0 | 8.8 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0.23 | 0.11 | 0.34 | 0.23 | 0.34 | 3.5 | 3.5 | 0.2 | 3.7 | 0.40 | 0.02 | 0.42 | 0.40 | 0.42 | Matches | 23.0 | 8.0 | 34.8 | 2.62 | 0.91 | 0.09 | 0.25 | 13.7 | 0.0 | 0.15 | -1.5 | -1.5 | 79.0 | 101.0 | 78.2 | 1147.0 | 165.0 | 45.0 | 56.0 | 80.4 | 26.0 | 30.0 | 86.7 | 4.0 | 5.0 | 80.0 | 0.8 | 5.0 | 2.0 | 1.0 | 0.0 | 3.0 | 91.0 | 10.0 | 0.0 | 22.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 64.0 | 26.0 | 11.0 | 11.0 | 74.0 | 5.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 4.0 | 12.0 | 1.37 | 7.0 | 0.0 | 3.0 | 2.0 | 0.0 | 1.0 | 0.11 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 5.0 | 2.0 | 4.0 | 1.0 | 1.0 | 20.0 | 4.0 | 40.0 | 32.3 | 7.0 | 58.0 | 59.0 | 0.0 | 8.0 | 7.0 | 1.0 | 0.0 | 201.0 | 1.0 | 38.0 | 80.0 | 8.0 | 0.0 | 124.0 | 4.0 | 29.0 | 15.0 | 357.0 | 143.0 | 40.1 | 64.0 | 47 | 23.1 | 68.0 | NaN | 8 | 23.0 | 11 | 0.88 | 12.0 | 17.0 | -5.0 | -0.57 | -0.53 | 13.8 | 7.8 | 6.0 | 0.69 | 0.42 | 0.0 | 5 | 1.0 | 0.0 | 0.0 | 28.0 | 11.0 | 30.0 | 26.8 | 2020/2021 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 6300000.0 | 7000000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
NaN | ie IRL | FW | Brighton | Premier League | 21 | 2000.0 | 1 | 0 | 45.0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.4 | 0.4 | 0.0 | 0.4 | 0.85 | 0.00 | 0.85 | 0.85 | 0.85 | Matches | 1.0 | 0.0 | 0.0 | 2.00 | 0.00 | 0.00 | NaN | 9.0 | 0.0 | 0.42 | -0.4 | -0.4 | 2.0 | 3.0 | 66.7 | 14.0 | 0.0 | 2.0 | 2.0 | 100.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 1.0 | 8.3 | 1.0 | 6.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 2.0 | NaN | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 20.0 | 4.0 | 20.0 | 2.0 | 45 | 16.7 | NaN | 0.0 | 1 | 45.0 | 1 | 3.00 | 0.0 | 0.0 | 0.0 | 0.00 | -0.40 | 1.0 | 0.7 | 0.3 | 0.68 | 0.74 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | 2021/2022 | Brighton | England | aaron connolly | aaron | connolly | a | england | IRL | Ireland | FW | Forward | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | ireland | Aaron Connolly | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | Centre-Forward | 434207 | 27c01749 | aaron connolly | Brighton Hove Albion | brighton & hove albion | GB1 | 21.0 | 6300000.0 | 7000000.0 | 2000-01-28 | Galway | attack - Centre-Forward | ST | Forward | 175.0 | right | Ireland | NaN | PLG | 28.0 | 1.0 | Ireland | england | 7000000.0 | 2019-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | aaron connolly | aaron | connolly | a | england | 21.0 | 19.0 | 2.0 | 2.0 | 6300000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1363.0 | eng ENG | DF | West Ham | Premier League | 27 | 1989.0 | 36 | 35 | 3069.0 | 34.1 | 1 | 3 | 1 | 0 | 0 | 7 | 0 | 0.03 | 0.09 | 0.12 | 0.03 | 0.12 | 0.8 | 0.8 | 2.8 | 3.6 | 0.02 | 0.08 | 0.10 | 0.02 | 0.10 | Matches | 21.0 | 6.0 | 28.6 | 0.62 | 0.18 | 0.05 | 0.17 | 28.1 | 8.0 | 0.04 | 0.2 | 0.2 | 1224.0 | 1708.0 | 71.7 | 23519.0 | 10212.0 | 560.0 | 623.0 | 89.9 | 472.0 | 587.0 | 80.4 | 183.0 | 449.0 | 40.8 | 0.2 | 35.0 | 117.0 | 21.0 | 14.0 | 96.0 | 1343.0 | 365.0 | 1.0 | 222.0 | 83.0 | 93.0 | 67.0 | 35.0 | 15.0 | 9.0 | 893.0 | 293.0 | 522.0 | 1329.0 | 78.0 | 59.0 | 210.0 | 5.0 | 15.0 | 44.0 | 39.0 | 52.0 | 62.0 | 1.82 | 35.0 | 21.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.26 | 6.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | 18.0 | 15.0 | 18.0 | 5.0 | 17.0 | 53.1 | 15.0 | 115.0 | 32.1 | 181.0 | 123.0 | 54.0 | 0.0 | 38.0 | 90.0 | 133.0 | 0.0 | 2050.0 | 125.0 | 17.0 | 33.3 | 7.0 | 0.0 | 1071.0 | 2.0 | 18.0 | 19.0 | 1171.0 | 1094.0 | 93.4 | 31.0 | 85 | 89.7 | NaN | 30.0 | 1 | NaN | 1 | 1.14 | 45.0 | 60.0 | -15.0 | -0.44 | 0.84 | 38.0 | 51.5 | -13.5 | -0.40 | 1.09 | 0.0 | 20 | 0.0 | 0.0 | 0.0 | 277.0 | 70.0 | 57.0 | 55.1 | 2017/2018 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 | aaron cresswell | West Ham United | west ham united | GB1 | 31.0 | 10800000.0 | 12000000.0 | 1989-12-15 | Liverpool | Defender - Left-Back | LB | Defender | 170.0 | left | England | NaN | Unique Sports Management | 15.0 | 12.0 | England | england | 5000000.0 | 2014-07-03 | 2023-06-30 | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | 31.0 | 24.0 | 7.0 | 1.0 | 4500000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Aaron Cresswell | Premier League | West Ham | 50000.0 | 2600000.0 | 2671365.0 | NaN | NaN | NaN | NaN |
2395.0 | eng ENG | DF | West Ham | Premier League | 28 | 1989.0 | 20 | 18 | 1589.0 | 17.7 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.00 | 0.06 | 0.06 | 0.00 | 0.06 | 0.5 | 0.5 | 0.9 | 1.4 | 0.03 | 0.05 | 0.08 | 0.03 | 0.08 | Matches | 11.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.00 | NaN | 23.5 | 2.0 | 0.04 | -0.5 | -0.5 | 842.0 | 1070.0 | 78.7 | 13627.0 | 5572.0 | 453.0 | 501.0 | 90.4 | 307.0 | 371.0 | 82.7 | 64.0 | 140.0 | 45.7 | 0.1 | 16.0 | 55.0 | 15.0 | 5.0 | 65.0 | 854.0 | 216.0 | 0.0 | 168.0 | 18.0 | 46.0 | 10.0 | 0.0 | 2.0 | 0.0 | 642.0 | 235.0 | 193.0 | 787.0 | 51.0 | 27.0 | 190.0 | 4.0 | 2.0 | 21.0 | 27.0 | 44.0 | 29.0 | 1.64 | 19.0 | 6.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.11 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 19.0 | 14.0 | 12.0 | 4.0 | 12.0 | 42.9 | 16.0 | 68.0 | 31.5 | 129.0 | 59.0 | 28.0 | 0.0 | 39.0 | 49.0 | 60.0 | 1.0 | 1266.0 | 78.0 | 36.0 | 63.6 | 7.0 | 1.0 | 723.0 | 8.0 | 11.0 | 13.0 | 797.0 | 715.0 | 89.7 | 43.0 | 79 | 46.5 | 85.0 | 16.0 | 2 | 30.0 | 7 | 1.30 | 21.0 | 26.0 | -5.0 | -0.28 | -0.38 | 20.1 | 25.3 | -5.3 | -0.30 | 0.12 | 0.0 | 2 | 0.0 | 0.0 | 0.0 | 169.0 | 22.0 | 14.0 | 61.1 | 2018/2019 | West Ham | England | aaron cresswell | aaron | cresswell | a | england | ENG | England | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | Aaron Cresswell | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | Left-Back | 92571 | 4f974391 | aaron cresswell | West Ham United | west ham united | GB1 | 31.0 | 9000000.0 | 10000000.0 | 1989-12-15 | Liverpool | Defender - Left-Back | LB | Defender | 170.0 | left | England | NaN | Unique Sports Management | 15.0 | 12.0 | England | england | 5000000.0 | 2014-07-03 | 2023-06-30 | NaN | NaN | NaN | NaN | aaron cresswell | aaron | cresswell | a | england | 31.0 | 24.0 | 7.0 | 1.0 | 4500000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Aaron Cresswell | Premier League | West Ham | 50000.0 | 2600000.0 | 2625727.0 | NaN | NaN | NaN | NaN |
df_merge_fbref_tm_capology[df_merge_fbref_tm_capology['player_name_fbref'].str.contains('Gerard Piqu', na=False)]
Nation | Pos | Squad | Comp | Age | birth_year | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | Gls.1 | Ast.1 | G+A | G-PK.1 | G+A-PK | xG | npxG | xA | npxG+xA | xG.1 | xA.1 | xG+xA | npxG.1 | npxG+xA.1 | Matches | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | Cmp.1 | Att.1 | Cmp%.1 | Cmp.2 | Att.2 | Cmp%.2 | Cmp.3 | Att.3 | Cmp%.3 | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Out.1 | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassLive.1 | PassDead.1 | Drib.1 | Sh.1 | Fld.1 | Def.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl.1 | Tkl% | Past | Succ | % | Def 3rd.1 | Mid 3rd.1 | Att 3rd.1 | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Prog.1 | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | On-Off.1 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | season | Team Name | Team Country | Player Lower | First Name Lower | Last Name Lower | First Initial Lower | Team Country Lower | Nationality Code | Nationality Cleaned | Primary Pos | Position Grouped | outfielder_goalkeeper | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | Save%.1 | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Launch%.1 | AvgLen.1 | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | player_name_lower | first_name_lower | last_name_lower | first_initial_lower | country_lower | player_name_fbref | url_fbref | url_tm | TmPos | tm_id | fbref_id | player_name_tm | club | current_club | league_code | current_age | market_value_gbp | market_value_eur | dob | pob | position | position_code | position_grouped | height | foot | citizenship | second_citizenship | player_agent | birth_day | birth_month | cob | current_club_country | market_value_euros | joined | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | name_lower | firstname_lower | lastname_lower | firstinitial_lower | league_country_lower | age | age_when_joining | years_since_joining | years_until_contract_expiry | market_value_pounds | club_name | club_involved_name | fee | transfer_movement | transfer_period | fee_cleaned | league_name | player_name_capology | league | team | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9270.0 | es ESP | DF | Barcelona | La Liga | 30 | 1987.0 | 30 | 29 | 2631.0 | 29.2 | 2 | 0 | 2 | 0 | 0 | 8 | 0 | 0.07 | 0.00 | 0.07 | 0.07 | 0.07 | 3.0 | 3.0 | 0.9 | 3.8 | 0.10 | 0.03 | 0.13 | 0.10 | 0.13 | Matches | 18.0 | 5.0 | 27.8 | 0.62 | 0.17 | 0.11 | 0.40 | 9.2 | 0.0 | 0.16 | -1.0 | -1.0 | 1606.0 | 1810.0 | 88.7 | 34428.0 | 10291.0 | 460.0 | 499.0 | 92.2 | 841.0 | 904.0 | 93.0 | 293.0 | 380.0 | 77.1 | -0.9 | 6.0 | 123.0 | 4.0 | 0.0 | 90.0 | 1764.0 | 46.0 | 0.0 | 263.0 | 70.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1436.0 | 120.0 | 254.0 | 95.0 | 1552.0 | 106.0 | 10.0 | 8.0 | 1.0 | 28.0 | 19.0 | 11.0 | 17.0 | 0.58 | 13.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 30.0 | 22.0 | 22.0 | 8.0 | 0.0 | 12.0 | 34.3 | 23.0 | 78.0 | 32.5 | 144.0 | 80.0 | 16.0 | 2.0 | 26.0 | 62.0 | 103.0 | 5.0 | 2075.0 | 318.0 | 28.0 | 60.0 | 6.0 | 0.0 | 1419.0 | 0.0 | 4.0 | 7.0 | 1385.0 | 1341.0 | 96.8 | 27.0 | 88 | 76.9 | NaN | 27.0 | 1 | NaN | 6 | 2.57 | 81.0 | 23.0 | 58.0 | 1.98 | 0.62 | 65.0 | 30.0 | 35.1 | 1.20 | 0.96 | 0.0 | 23 | 1.0 | 0.0 | 0.0 | 353.0 | 54.0 | 19.0 | 74.0 | 2017/2018 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Gerard Piqué | La Liga | Barcelona | 218534.0 | 11363788.0 | 11406357.0 | NaN | NaN | NaN | NaN |
9412.0 | es ESP | DF | Barcelona | La Liga | 31 | 1987.0 | 35 | 35 | 3150.0 | 35.0 | 4 | 2 | 4 | 0 | 0 | 6 | 0 | 0.11 | 0.06 | 0.17 | 0.11 | 0.17 | 3.7 | 3.7 | 1.5 | 5.2 | 0.11 | 0.04 | 0.15 | 0.11 | 0.15 | Matches | 20.0 | 11.0 | 55.0 | 0.57 | 0.31 | 0.20 | 0.36 | 7.3 | 0.0 | 0.19 | 0.3 | 0.3 | 2230.0 | 2429.0 | 91.8 | 46352.0 | 14931.0 | 666.0 | 710.0 | 93.8 | 1209.0 | 1277.0 | 94.7 | 343.0 | 420.0 | 81.7 | 0.5 | 8.0 | 156.0 | 5.0 | 0.0 | 103.0 | 2347.0 | 82.0 | 3.0 | 296.0 | 58.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1937.0 | 190.0 | 302.0 | 97.0 | 2125.0 | 126.0 | 12.0 | 13.0 | 5.0 | 22.0 | 14.0 | 13.0 | 25.0 | 0.71 | 17.0 | 0.0 | 2.0 | 1.0 | 1.0 | 6.0 | 0.17 | 2.0 | 0.0 | 1.0 | 2.0 | 1.0 | 0.0 | 45.0 | 27.0 | 29.0 | 15.0 | 1.0 | 20.0 | 54.1 | 17.0 | 87.0 | 31.9 | 160.0 | 101.0 | 12.0 | 2.0 | 37.0 | 77.0 | 156.0 | 2.0 | 2760.0 | 385.0 | 32.0 | 76.2 | 16.0 | 0.0 | 1915.0 | 1.0 | 11.0 | 9.0 | 1963.0 | 1889.0 | 96.2 | 27.0 | 90 | 92.1 | 90.0 | 35.0 | 0 | NaN | 1 | 2.43 | 86.0 | 30.0 | 56.0 | 1.60 | 2.27 | 69.9 | 35.6 | 34.2 | 0.98 | 1.29 | 0.0 | 24 | 1.0 | 0.0 | 0.0 | 473.0 | 91.0 | 33.0 | 73.4 | 2018/2019 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 36000000.0 | 40000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Gerard Piqué | La Liga | Barcelona | 220629.0 | 11472752.0 | 11435733.0 | NaN | NaN | NaN | NaN |
9539.0 | es ESP | DF | Barcelona | La Liga | 32 | 1987.0 | 35 | 35 | 3092.0 | 34.4 | 1 | 0 | 1 | 0 | 0 | 15 | 0 | 0.03 | 0.00 | 0.03 | 0.03 | 0.03 | 2.3 | 2.3 | 0.6 | 2.9 | 0.07 | 0.02 | 0.08 | 0.07 | 0.08 | Matches | 15.0 | 6.0 | 40.0 | 0.44 | 0.17 | 0.07 | 0.17 | 8.4 | 1.0 | 0.15 | -1.3 | -1.3 | 2469.0 | 2659.0 | 92.9 | 53752.0 | 14795.0 | 645.0 | 682.0 | 94.6 | 1381.0 | 1437.0 | 96.1 | 427.0 | 506.0 | 84.4 | -0.6 | 5.0 | 192.0 | 3.0 | 0.0 | 116.0 | 2548.0 | 111.0 | 2.0 | 275.0 | 79.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2121.0 | 227.0 | 311.0 | 113.0 | 2364.0 | 99.0 | 9.0 | 13.0 | 1.0 | 25.0 | 14.0 | 14.0 | 12.0 | 0.35 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.03 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 37.0 | 22.0 | 21.0 | 16.0 | 0.0 | 14.0 | 48.3 | 15.0 | 83.0 | 34.0 | 128.0 | 109.0 | 7.0 | 2.0 | 40.0 | 73.0 | 182.0 | 1.0 | 2996.0 | 427.0 | 28.0 | 100.0 | 9.0 | 0.0 | 2084.0 | 0.0 | 9.0 | 7.0 | 2211.0 | 2171.0 | 98.2 | 21.0 | 88 | 90.4 | 88.0 | 31.0 | 0 | NaN | 0 | 2.09 | 71.0 | 36.0 | 35.0 | 1.02 | -2.55 | 56.2 | 33.8 | 22.4 | 0.65 | -1.49 | 0.0 | 32 | 0.0 | 2.0 | 0.0 | 391.0 | 128.0 | 40.0 | 76.2 | 2019/2020 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 22500000.0 | 25000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Gerard Piqué | La Liga | Barcelona | 432946.0 | 22513250.0 | 22513250.0 | NaN | NaN | NaN | NaN |
9623.0 | es ESP | DF | Barcelona | La Liga | 33 | 1987.0 | 18 | 18 | 1481.0 | 16.5 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.6 | 0.6 | 0.5 | 1.1 | 0.04 | 0.03 | 0.07 | 0.04 | 0.07 | Matches | 8.0 | 2.0 | 25.0 | 0.49 | 0.12 | 0.00 | 0.00 | 9.3 | 0.0 | 0.08 | -0.6 | -0.6 | 1173.0 | 1247.0 | 94.1 | 24743.0 | 6069.0 | 321.0 | 340.0 | 94.4 | 666.0 | 690.0 | 96.5 | 173.0 | 197.0 | 87.8 | -0.5 | 2.0 | 82.0 | 1.0 | 0.0 | 35.0 | 1204.0 | 43.0 | 0.0 | 91.0 | 32.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1033.0 | 93.0 | 121.0 | 72.0 | 1079.0 | 45.0 | 5.0 | 10.0 | 1.0 | 14.0 | 6.0 | 4.0 | 5.0 | 0.30 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.06 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 21.0 | 13.0 | 13.0 | 7.0 | 1.0 | 9.0 | 64.3 | 5.0 | 46.0 | 40.4 | 51.0 | 62.0 | 1.0 | 0.0 | 13.0 | 36.0 | 65.0 | 0.0 | 1368.0 | 161.0 | 14.0 | 100.0 | 1.0 | 0.0 | 961.0 | 0.0 | 2.0 | 3.0 | 1046.0 | 1015.0 | 97.0 | 7.0 | 82 | 43.3 | 82.0 | 13.0 | 0 | NaN | 1 | 1.53 | 34.0 | 22.0 | 12.0 | 0.73 | -0.90 | 32.9 | 16.1 | 16.8 | 1.02 | -0.02 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 182.0 | 66.0 | 21.0 | 75.9 | 2020/2021 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 13500000.0 | 15000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Gerard Deulofeu | Serie A | Udinese | 22166.0 | 1152678.0 | 1152678.0 | NaN | NaN | NaN | NaN |
NaN | es ESP | DF | Barcelona | La Liga | 34 | 1987.0 | 2 | 2 | 120.0 | 1.3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0.75 | 0.00 | 0.75 | 0.75 | 0.75 | 0.2 | 0.2 | 0.0 | 0.2 | 0.15 | 0.00 | 0.15 | 0.15 | 0.15 | Matches | 1.0 | 1.0 | 100.0 | 0.75 | 0.75 | 1.00 | 1.00 | 8.0 | 0.0 | 0.19 | 0.8 | 0.8 | 94.0 | 97.0 | 96.9 | 2030.0 | 415.0 | 26.0 | 28.0 | 92.9 | 51.0 | 51.0 | 100.0 | 17.0 | 18.0 | 94.4 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 89.0 | 8.0 | 0.0 | 9.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | 5.0 | 7.0 | 11.0 | 80.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.75 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | NaN | 0.0 | 2.0 | 25.0 | 4.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 10.0 | 0.0 | 112.0 | 24.0 | 1.0 | NaN | 0.0 | 0.0 | 69.0 | 0.0 | 0.0 | 0.0 | 82.0 | 79.0 | 96.3 | 0.0 | 60 | 44.4 | 60.0 | 1.0 | 0 | NaN | 0 | 2.00 | 4.0 | 2.0 | 2.0 | 1.50 | 0.90 | 3.3 | 1.5 | 1.8 | 1.35 | 1.00 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 9.0 | 5.0 | 1.0 | 83.3 | 2021/2022 | Barcelona | Spain | gerard piqua | gerard | piqua | g | spain | ESP | Spain | DF | Defender | Outfielder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | gerard piqua | gerard | piqua | g | spain | Gerard Piqué | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | Centre-Back | 18944 | adfc9123 | gerard piqué | FC Barcelona | fc barcelona | ES1 | 34.0 | 9000000.0 | 10000000.0 | 1987-02-02 | Barcelona | Defender - Centre-Back | CB | Defender | 194.0 | right | Spain | NaN | AC Talent | 2.0 | 2.0 | Spain | spain | 10000000.0 | 2008-07-01 | 2024-06-30 | NaN | NaN | NaN | NaN | gerard pique | gerard | pique | g | spain | 34.0 | 21.0 | 13.0 | 2.0 | 9000000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
print(df_merge_fbref_tm_capology.columns.tolist())
['Nation', 'Pos', 'Squad', 'Comp', 'Age', 'birth_year', 'MP', 'Starts', 'Min', '90s', 'Gls', 'Ast', 'G-PK', 'PK', 'PKatt', 'CrdY', 'CrdR', 'Gls.1', 'Ast.1', 'G+A', 'G-PK.1', 'G+A-PK', 'xG', 'npxG', 'xA', 'npxG+xA', 'xG.1', 'xA.1', 'xG+xA', 'npxG.1', 'npxG+xA.1', 'Matches', 'Sh', 'SoT', 'SoT%', 'Sh/90', 'SoT/90', 'G/Sh', 'G/SoT', 'Dist', 'FK', 'npxG/Sh', 'G-xG', 'np:G-xG', 'Cmp', 'Att', 'Cmp%', 'TotDist', 'PrgDist', 'Cmp.1', 'Att.1', 'Cmp%.1', 'Cmp.2', 'Att.2', 'Cmp%.2', 'Cmp.3', 'Att.3', 'Cmp%.3', 'A-xA', 'KP', '1/3', 'PPA', 'CrsPA', 'Prog', 'Live', 'Dead', 'TB', 'Press', 'Sw', 'Crs', 'CK', 'In', 'Out', 'Str', 'Ground', 'Low', 'High', 'Left', 'Right', 'Head', 'TI', 'Other', 'Off', 'Out.1', 'Int', 'Blocks', 'SCA', 'SCA90', 'PassLive', 'PassDead', 'Drib', 'Fld', 'Def', 'GCA', 'GCA90', 'PassLive.1', 'PassDead.1', 'Drib.1', 'Sh.1', 'Fld.1', 'Def.1', 'Tkl', 'TklW', 'Def 3rd', 'Mid 3rd', 'Att 3rd', 'Tkl.1', 'Tkl%', 'Past', 'Succ', '%', 'Def 3rd.1', 'Mid 3rd.1', 'Att 3rd.1', 'ShSv', 'Pass', 'Tkl+Int', 'Clr', 'Err', 'Touches', 'Def Pen', 'Att Pen', 'Succ%', '#Pl', 'Megs', 'Carries', 'CPA', 'Mis', 'Dis', 'Targ', 'Rec', 'Rec%', 'Prog.1', 'Mn/MP', 'Min%', 'Mn/Start', 'Compl', 'Subs', 'Mn/Sub', 'unSub', 'PPM', 'onG', 'onGA', '+/-', '+/-90', 'On-Off', 'onxG', 'onxGA', 'xG+/-', 'xG+/-90', 'On-Off.1', '2CrdY', 'Fls', 'PKwon', 'PKcon', 'OG', 'Recov', 'Won', 'Lost', 'Won%', 'season', 'Team Name', 'Team Country', 'Player Lower', 'First Name Lower', 'Last Name Lower', 'First Initial Lower', 'Team Country Lower', 'Nationality Code', 'Nationality Cleaned', 'Primary Pos', 'Position Grouped', 'outfielder_goalkeeper', 'GA', 'GA90', 'SoTA', 'Saves', 'Save%', 'W', 'D', 'L', 'CS', 'CS%', 'PKA', 'PKsv', 'PKm', 'Save%.1', 'PSxG', 'PSxG/SoT', 'PSxG+/-', '/90', 'Thr', 'Launch%', 'AvgLen', 'Launch%.1', 'AvgLen.1', 'Opp', 'Stp', 'Stp%', '#OPA', '#OPA/90', 'AvgDist', 'player_name_lower', 'first_name_lower', 'last_name_lower', 'first_initial_lower', 'country_lower', 'player_name_fbref', 'url_fbref', 'url_tm', 'TmPos', 'tm_id', 'fbref_id', 'player_name_tm', 'club', 'current_club', 'league_code', 'current_age', 'market_value_gbp', 'market_value_eur', 'dob', 'pob', 'position', 'position_code', 'position_grouped', 'height', 'foot', 'citizenship', 'second_citizenship', 'player_agent', 'birth_day', 'birth_month', 'cob', 'current_club_country', 'market_value_euros', 'joined', 'contract_expires', 'contract_option', 'on_loan_from', 'on_loan_from_country', 'loan_contract_expiry', 'name_lower', 'firstname_lower', 'lastname_lower', 'firstinitial_lower', 'league_country_lower', 'age', 'age_when_joining', 'years_since_joining', 'years_until_contract_expiry', 'market_value_pounds', 'club_name', 'club_involved_name', 'fee', 'transfer_movement', 'transfer_period', 'fee_cleaned', 'league_name', 'player_name_capology', 'league', 'team', 'weekly_gross_base_salary_gbp', 'annual_gross_base_salary_gbp', 'adj_current_gross_base_salary_gbp', 'estimated_gross_total_gbp', 'current_contract_status', 'current_contract_expiration', 'current_contract_length']
# Define columns
## Str
cols_export = [
## PLAYER NAME
'player_name_fbref',
#'Player Lower',
#'First Name Lower',
#'Last Name Lower',
#'First Initial Lower',
#'player_name_tm',
#'player_name_capology',
#'name_lower',
#'firstname_lower',
#'lastname_lower',
#'firstinitial_lower',
## SEASON
'season',
## IDS
'url_fbref',
'url_tm',
'tm_id',
'fbref_id',
## TEAM
'Squad',
#'Team Name',
#'team',
'Team Country',
#'Team Country Lower',
#'current_club_country',
#'league_country_lower',
#'club',
#'current_club',
## LEAGUE
'Comp',
#'league',
#'league_name',
#'league_code',
## POSITION
'Pos',
'Primary Pos',
'TmPos',
'Position Grouped',
#'position',
#'position_code',
#'position_grouped',
'outfielder_goalkeeper',
## AGE
'Age',
'age_when_joining',
#'birth_day',
#'birth_month',
#'birth_year',
#'age',
#'current_age',
'dob',
## PHYSICAL ATTRINUTES
'height',
'foot',
## NATIONALITY
'pob',
'cob',
'Nationality Cleaned',
#'Nationality Code',
'citizenship',
'second_citizenship',
## TRANSFERMAKRT VALUATION
'market_value_gbp',
#'market_value_pounds',
'market_value_eur',
#'market_value_euros',
'joined',
'years_since_joining',
'years_until_contract_expiry',
'contract_expires',
'contract_option',
'on_loan_from',
'on_loan_from_country',
'loan_contract_expiry',
#'player_agent',
## CAPOLOGY SALARY INFORMATION
'weekly_gross_base_salary_gbp',
'annual_gross_base_salary_gbp',
'adj_current_gross_base_salary_gbp',
'estimated_gross_total_gbp',
'current_contract_status',
'current_contract_expiration',
'current_contract_length',
## TRANSFER HISTORY INFORMATION - NOT INCLUDED RIGHT NOW
#'club_name',
#'club_involved_name',
#'fee',
#'transfer_movement',
#'transfer_period',
#'fee_cleaned',
## PLAYER STATS
'MP',
'Starts',
'Min',
'90s',
'Gls',
'Ast',
'G-PK',
'PK',
'PKatt',
'CrdY',
'CrdR',
#'Gls.1',
#'Ast.1',
'G+A',
#'G-PK.1',
'G+A-PK',
'xG',
'npxG',
'xA',
'npxG+xA',
#'xG.1',
#'xA.1',
'xG+xA',
#'npxG.1',
#'npxG+xA.1',
#'Matches',
'Sh',
'SoT',
'SoT%',
'Sh/90',
'SoT/90',
'G/Sh',
'G/SoT',
'Dist',
'FK',
'npxG/Sh',
'G-xG',
'np:G-xG',
'Cmp',
'Att',
'Cmp%',
'TotDist',
'PrgDist',
#'Cmp.1',
#'Att.1',
#'Cmp%.1',
#'Cmp.2',
#'Att.2',
#'Cmp%.2',
#'Cmp.3',
#'Att.3',
#'Cmp%.3',
'A-xA',
'KP',
'1/3',
'PPA',
'CrsPA',
'Prog',
'Live',
'Dead',
'TB',
'Press',
'Sw',
'Crs',
'CK',
'In',
'Out',
'Str',
'Ground',
'Low',
'High',
'Left',
'Right',
'Head',
'TI',
'Other',
'Off',
#'Out.1',
'Int',
'Blocks',
'SCA',
'SCA90',
'PassLive',
'PassDead',
'Drib',
'Fld',
'Def',
'GCA',
'GCA90',
#'PassLive.1',
'PassDead.1',
#'Drib.1',
#'Sh.1',
#'Fld.1',
#'Def.1',
'Tkl',
'TklW',
'Def 3rd',
'Mid 3rd',
'Att 3rd',
#'Tkl.1',
'Tkl%',
'Past',
'Succ',
'%',
#'Def 3rd.1',
#'Mid 3rd.1',
#'Att 3rd.1',
'ShSv',
'Pass',
'Tkl+Int',
'Clr',
'Err',
'Touches',
'Def Pen',
'Att Pen',
'Succ%',
'#Pl',
'Megs',
'Carries',
'CPA',
'Mis',
'Dis',
'Targ',
'Rec',
'Rec%',
#'Prog.1',
'Mn/MP',
'Min%',
'Mn/Start',
'Compl',
'Subs',
'Mn/Sub',
'unSub',
'PPM',
'onG',
'onGA',
'+/-',
'+/-90',
'On-Off',
'onxG',
'onxGA',
'xG+/-',
'xG+/-90',
#'On-Off.1',
'2CrdY',
'Fls',
'PKwon',
'PKcon',
'OG',
'Recov',
'Won',
'Lost',
'Won%',
'GA',
'GA90',
'SoTA',
'Saves',
'Save%',
'W',
'D',
'L',
'CS',
'CS%',
'PKA',
'PKsv',
'PKm',
#'Save%.1',
'PSxG',
'PSxG/SoT',
'PSxG+/-',
'/90',
'Thr',
'Launch%',
'AvgLen',
#'Launch%.1',
#'AvgLen.1',
'Opp',
'Stp',
'Stp%',
'#OPA',
'#OPA/90',
'AvgDist'
]
# Create DataFrame of string values
## Select columns of interest
df_merge_fbref_tm_capology_select = df_merge_fbref_tm_capology[cols_export]
## Drop duplicate column (duplicate 'team', temporary solution, needs to be moved up)
df_merge_fbref_tm_capology_select = df_merge_fbref_tm_capology_select.loc[:, ~df_merge_fbref_tm_capology_select.columns.duplicated()]
## Display DataFrame
df_merge_fbref_tm_capology_select.head()
player_name_fbref | season | url_fbref | url_tm | tm_id | fbref_id | Squad | Team Country | Comp | Pos | Primary Pos | TmPos | Position Grouped | outfielder_goalkeeper | Age | age_when_joining | dob | height | foot | pob | cob | Nationality Cleaned | citizenship | second_citizenship | market_value_gbp | market_value_eur | joined | years_since_joining | years_until_contract_expiry | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | G+A | G+A-PK | xG | npxG | xA | npxG+xA | xG+xA | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassDead.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl% | Past | Succ | % | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | Aaron Connolly | 2019/2020 | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | 434207 | 27c01749 | Brighton | England | Premier League | FW | FW | Centre-Forward | Forward | Outfielder | 19 | 19.0 | 2000-01-28 | 175.0 | right | Galway | Ireland | Ireland | Ireland | NaN | 4050000.0 | 4500000.0 | 2019-07-01 | 2.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 24 | 14 | 1258.0 | 14.0 | 3 | 1 | 3 | 0 | 0 | 0 | 0 | 0.29 | 0.29 | 3.2 | 3.2 | 0.3 | 3.5 | 0.25 | 38.0 | 13.0 | 34.2 | 2.72 | 0.93 | 0.08 | 0.23 | 15.9 | 0.0 | 0.08 | -0.2 | -0.2 | 126.0 | 163.0 | 77.3 | 1739.0 | 242.0 | 0.7 | 6.0 | 6.0 | 2.0 | 0.0 | 10.0 | 148.0 | 15.0 | 1.0 | 50.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 52.0 | 21.0 | 27.0 | 107.0 | 13.0 | 1.0 | 6.0 | 0.0 | 4.0 | 10.0 | 25.0 | 1.79 | 7.0 | 0.0 | 3.0 | 9.0 | 3.0 | 5.0 | 0.36 | 0.0 | 12.0 | 8.0 | 1.0 | 5.0 | 6.0 | 25.0 | 9.0 | 69.0 | 29.5 | 0.0 | 7.0 | 17.0 | 1.0 | 0.0 | 349.0 | 2.0 | 61.0 | 37.5 | 6.0 | 1.0 | 228.0 | 12.0 | 42.0 | 34.0 | 535.0 | 235.0 | 43.9 | 52 | 36.8 | 72.0 | 0.0 | 10 | 26.0 | 4 | 1.13 | 18.0 | 22.0 | -4.0 | -0.29 | 0.17 | 15.6 | 19.8 | -4.2 | -0.30 | 0.0 | 16 | 2.0 | 0.0 | 0.0 | 54.0 | 14.0 | 48.0 | 22.6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
NaN | Aaron Connolly | 2020/2021 | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | 434207 | 27c01749 | Brighton | England | Premier League | FW | FW | Centre-Forward | Forward | Outfielder | 20 | 19.0 | 2000-01-28 | 175.0 | right | Galway | Ireland | Ireland | Ireland | NaN | 6300000.0 | 7000000.0 | 2019-07-01 | 2.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 17 | 9 | 791.0 | 8.8 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0.34 | 0.34 | 3.5 | 3.5 | 0.2 | 3.7 | 0.42 | 23.0 | 8.0 | 34.8 | 2.62 | 0.91 | 0.09 | 0.25 | 13.7 | 0.0 | 0.15 | -1.5 | -1.5 | 79.0 | 101.0 | 78.2 | 1147.0 | 165.0 | 0.8 | 5.0 | 2.0 | 1.0 | 0.0 | 3.0 | 91.0 | 10.0 | 0.0 | 22.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 64.0 | 26.0 | 11.0 | 11.0 | 74.0 | 5.0 | 0.0 | 3.0 | 0.0 | 2.0 | 4.0 | 12.0 | 1.37 | 7.0 | 0.0 | 3.0 | 2.0 | 0.0 | 1.0 | 0.11 | 0.0 | 7.0 | 5.0 | 2.0 | 4.0 | 1.0 | 20.0 | 4.0 | 40.0 | 32.3 | 0.0 | 8.0 | 7.0 | 1.0 | 0.0 | 201.0 | 1.0 | 38.0 | 80.0 | 8.0 | 0.0 | 124.0 | 4.0 | 29.0 | 15.0 | 357.0 | 143.0 | 40.1 | 47 | 23.1 | 68.0 | NaN | 8 | 23.0 | 11 | 0.88 | 12.0 | 17.0 | -5.0 | -0.57 | -0.53 | 13.8 | 7.8 | 6.0 | 0.69 | 0.0 | 5 | 1.0 | 0.0 | 0.0 | 28.0 | 11.0 | 30.0 | 26.8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
NaN | Aaron Connolly | 2021/2022 | https://fbref.com/en/players/27c01749/Aaron-Co... | https://www.transfermarkt.com/aaron-connolly/p... | 434207 | 27c01749 | Brighton | England | Premier League | FW | FW | Centre-Forward | Forward | Outfielder | 21 | 19.0 | 2000-01-28 | 175.0 | right | Galway | Ireland | Ireland | Ireland | NaN | 6300000.0 | 7000000.0 | 2019-07-01 | 2.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 0 | 45.0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.4 | 0.4 | 0.0 | 0.4 | 0.85 | 1.0 | 0.0 | 0.0 | 2.00 | 0.00 | 0.00 | NaN | 9.0 | 0.0 | 0.42 | -0.4 | -0.4 | 2.0 | 3.0 | 66.7 | 14.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 1.0 | 8.3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 2.0 | NaN | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 20.0 | 4.0 | 20.0 | 45 | 16.7 | NaN | 0.0 | 1 | 45.0 | 1 | 3.00 | 0.0 | 0.0 | 0.0 | 0.00 | -0.40 | 1.0 | 0.7 | 0.3 | 0.68 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1363.0 | Aaron Cresswell | 2017/2018 | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | 92571 | 4f974391 | West Ham | England | Premier League | DF | DF | Left-Back | Defender | Outfielder | 27 | 24.0 | 1989-12-15 | 170.0 | left | Liverpool | England | England | England | NaN | 10800000.0 | 12000000.0 | 2014-07-03 | 7.0 | 1.0 | 2023-06-30 | NaN | NaN | NaN | NaN | 50000.0 | 2600000.0 | 2671365.0 | NaN | NaN | NaN | NaN | 36 | 35 | 3069.0 | 34.1 | 1 | 3 | 1 | 0 | 0 | 7 | 0 | 0.12 | 0.12 | 0.8 | 0.8 | 2.8 | 3.6 | 0.10 | 21.0 | 6.0 | 28.6 | 0.62 | 0.18 | 0.05 | 0.17 | 28.1 | 8.0 | 0.04 | 0.2 | 0.2 | 1224.0 | 1708.0 | 71.7 | 23519.0 | 10212.0 | 0.2 | 35.0 | 117.0 | 21.0 | 14.0 | 96.0 | 1343.0 | 365.0 | 1.0 | 222.0 | 83.0 | 93.0 | 67.0 | 35.0 | 15.0 | 9.0 | 893.0 | 293.0 | 522.0 | 1329.0 | 78.0 | 59.0 | 210.0 | 5.0 | 15.0 | 39.0 | 52.0 | 62.0 | 1.82 | 35.0 | 21.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.26 | 3.0 | 38.0 | 18.0 | 15.0 | 18.0 | 5.0 | 53.1 | 15.0 | 115.0 | 32.1 | 0.0 | 38.0 | 90.0 | 133.0 | 0.0 | 2050.0 | 125.0 | 17.0 | 33.3 | 7.0 | 0.0 | 1071.0 | 2.0 | 18.0 | 19.0 | 1171.0 | 1094.0 | 93.4 | 85 | 89.7 | NaN | 30.0 | 1 | NaN | 1 | 1.14 | 45.0 | 60.0 | -15.0 | -0.44 | 0.84 | 38.0 | 51.5 | -13.5 | -0.40 | 0.0 | 20 | 0.0 | 0.0 | 0.0 | 277.0 | 70.0 | 57.0 | 55.1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2395.0 | Aaron Cresswell | 2018/2019 | https://fbref.com/en/players/4f974391/Aaron-Cr... | https://www.transfermarkt.com/aaron-cresswell/... | 92571 | 4f974391 | West Ham | England | Premier League | DF | DF | Left-Back | Defender | Outfielder | 28 | 24.0 | 1989-12-15 | 170.0 | left | Liverpool | England | England | England | NaN | 9000000.0 | 10000000.0 | 2014-07-03 | 7.0 | 1.0 | 2023-06-30 | NaN | NaN | NaN | NaN | 50000.0 | 2600000.0 | 2625727.0 | NaN | NaN | NaN | NaN | 20 | 18 | 1589.0 | 17.7 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.06 | 0.06 | 0.5 | 0.5 | 0.9 | 1.4 | 0.08 | 11.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.00 | NaN | 23.5 | 2.0 | 0.04 | -0.5 | -0.5 | 842.0 | 1070.0 | 78.7 | 13627.0 | 5572.0 | 0.1 | 16.0 | 55.0 | 15.0 | 5.0 | 65.0 | 854.0 | 216.0 | 0.0 | 168.0 | 18.0 | 46.0 | 10.0 | 0.0 | 2.0 | 0.0 | 642.0 | 235.0 | 193.0 | 787.0 | 51.0 | 27.0 | 190.0 | 4.0 | 2.0 | 27.0 | 44.0 | 29.0 | 1.64 | 19.0 | 6.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.11 | 2.0 | 30.0 | 19.0 | 14.0 | 12.0 | 4.0 | 42.9 | 16.0 | 68.0 | 31.5 | 0.0 | 39.0 | 49.0 | 60.0 | 1.0 | 1266.0 | 78.0 | 36.0 | 63.6 | 7.0 | 1.0 | 723.0 | 8.0 | 11.0 | 13.0 | 797.0 | 715.0 | 89.7 | 79 | 46.5 | 85.0 | 16.0 | 2 | 30.0 | 7 | 1.30 | 21.0 | 26.0 | -5.0 | -0.28 | -0.38 | 20.1 | 25.3 | -5.3 | -0.30 | 0.0 | 2 | 0.0 | 0.0 | 0.0 | 169.0 | 22.0 | 14.0 | 61.1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df_merge_fbref_tm_capology_select_notnull = df_merge_fbref_tm_capology_select[df_merge_fbref_tm_capology_select['player_name_fbref'].notna()]
print('No. rows in DataFrame BEFORE dropping NULL values: {}'.format(len(df_merge_fbref_tm_capology_select)))
print('No. rows AFTER dropping NULL values: {}'.format(len(df_merge_fbref_tm_capology_select_notnull)))
print('-'*10+'\n')
print('Variance in rows before and after dropping NULLs: {}\n'.format(len(df_merge_fbref_tm_capology_select_notnull) - len(df_merge_fbref_tm_capology_select)))
No. rows in DataFrame BEFORE dropping NULL values: 12753 No. rows AFTER dropping NULL values: 12735 ---------- Variance in rows before and after dropping NULLs: -18
print('No. rows in FBref DataFrame BEFORE join to any datasets: {}'.format(len(df_fbref_players)))
print('No. rows in FBref-TM-Capology DataFrame AFTER join: {}'.format(len(df_merge_fbref_tm_capology_select_notnull)))
print('-'*10+'\n')
print('Variance in rows before and after join: {}\n'.format(len(df_merge_fbref_tm_capology_select_notnull) - len(df_fbref_players)))
No. rows in FBref DataFrame BEFORE join to any datasets: 13680 No. rows in FBref-TM-Capology DataFrame AFTER join: 12735 ---------- Variance in rows before and after join: -945
Some players are removed but at this stage, the dataset is fine to be used for the next stage, but this will be fixed later.
df_merge_fbref_tm_capology_select_notnull[df_merge_fbref_tm_capology_select_notnull['player_name_fbref'].str.contains('Gerard Piqu', na=False)]
player_name_fbref | season | url_fbref | url_tm | tm_id | fbref_id | Squad | Team Country | Comp | Pos | Primary Pos | TmPos | Position Grouped | outfielder_goalkeeper | Age | age_when_joining | dob | height | foot | pob | cob | Nationality Cleaned | citizenship | second_citizenship | market_value_gbp | market_value_eur | joined | years_since_joining | years_until_contract_expiry | contract_expires | contract_option | on_loan_from | on_loan_from_country | loan_contract_expiry | weekly_gross_base_salary_gbp | annual_gross_base_salary_gbp | adj_current_gross_base_salary_gbp | estimated_gross_total_gbp | current_contract_status | current_contract_expiration | current_contract_length | MP | Starts | Min | 90s | Gls | Ast | G-PK | PK | PKatt | CrdY | CrdR | G+A | G+A-PK | xG | npxG | xA | npxG+xA | xG+xA | Sh | SoT | SoT% | Sh/90 | SoT/90 | G/Sh | G/SoT | Dist | FK | npxG/Sh | G-xG | np:G-xG | Cmp | Att | Cmp% | TotDist | PrgDist | A-xA | KP | 1/3 | PPA | CrsPA | Prog | Live | Dead | TB | Press | Sw | Crs | CK | In | Out | Str | Ground | Low | High | Left | Right | Head | TI | Other | Off | Int | Blocks | SCA | SCA90 | PassLive | PassDead | Drib | Fld | Def | GCA | GCA90 | PassDead.1 | Tkl | TklW | Def 3rd | Mid 3rd | Att 3rd | Tkl% | Past | Succ | % | ShSv | Pass | Tkl+Int | Clr | Err | Touches | Def Pen | Att Pen | Succ% | #Pl | Megs | Carries | CPA | Mis | Dis | Targ | Rec | Rec% | Mn/MP | Min% | Mn/Start | Compl | Subs | Mn/Sub | unSub | PPM | onG | onGA | +/- | +/-90 | On-Off | onxG | onxGA | xG+/- | xG+/-90 | 2CrdY | Fls | PKwon | PKcon | OG | Recov | Won | Lost | Won% | GA | GA90 | SoTA | Saves | Save% | W | D | L | CS | CS% | PKA | PKsv | PKm | PSxG | PSxG/SoT | PSxG+/- | /90 | Thr | Launch% | AvgLen | Opp | Stp | Stp% | #OPA | #OPA/90 | AvgDist | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9270.0 | Gerard Piqué | 2017/2018 | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | 18944 | adfc9123 | Barcelona | Spain | La Liga | DF | DF | Centre-Back | Defender | Outfielder | 30 | 21.0 | 1987-02-02 | 194.0 | right | Barcelona | Spain | Spain | Spain | NaN | 36000000.0 | 40000000.0 | 2008-07-01 | 13.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | 218534.0 | 11363788.0 | 11406357.0 | NaN | NaN | NaN | NaN | 30 | 29 | 2631.0 | 29.2 | 2 | 0 | 2 | 0 | 0 | 8 | 0 | 0.07 | 0.07 | 3.0 | 3.0 | 0.9 | 3.8 | 0.13 | 18.0 | 5.0 | 27.8 | 0.62 | 0.17 | 0.11 | 0.40 | 9.2 | 0.0 | 0.16 | -1.0 | -1.0 | 1606.0 | 1810.0 | 88.7 | 34428.0 | 10291.0 | -0.9 | 6.0 | 123.0 | 4.0 | 0.0 | 90.0 | 1764.0 | 46.0 | 0.0 | 263.0 | 70.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1436.0 | 120.0 | 254.0 | 95.0 | 1552.0 | 106.0 | 10.0 | 8.0 | 1.0 | 19.0 | 11.0 | 17.0 | 0.58 | 13.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | 0.03 | 0.0 | 30.0 | 22.0 | 22.0 | 8.0 | 0.0 | 34.3 | 23.0 | 78.0 | 32.5 | 2.0 | 26.0 | 62.0 | 103.0 | 5.0 | 2075.0 | 318.0 | 28.0 | 60.0 | 6.0 | 0.0 | 1419.0 | 0.0 | 4.0 | 7.0 | 1385.0 | 1341.0 | 96.8 | 88 | 76.9 | NaN | 27.0 | 1 | NaN | 6 | 2.57 | 81.0 | 23.0 | 58.0 | 1.98 | 0.62 | 65.0 | 30.0 | 35.1 | 1.20 | 0.0 | 23 | 1.0 | 0.0 | 0.0 | 353.0 | 54.0 | 19.0 | 74.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9412.0 | Gerard Piqué | 2018/2019 | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | 18944 | adfc9123 | Barcelona | Spain | La Liga | DF | DF | Centre-Back | Defender | Outfielder | 31 | 21.0 | 1987-02-02 | 194.0 | right | Barcelona | Spain | Spain | Spain | NaN | 36000000.0 | 40000000.0 | 2008-07-01 | 13.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | 220629.0 | 11472752.0 | 11435733.0 | NaN | NaN | NaN | NaN | 35 | 35 | 3150.0 | 35.0 | 4 | 2 | 4 | 0 | 0 | 6 | 0 | 0.17 | 0.17 | 3.7 | 3.7 | 1.5 | 5.2 | 0.15 | 20.0 | 11.0 | 55.0 | 0.57 | 0.31 | 0.20 | 0.36 | 7.3 | 0.0 | 0.19 | 0.3 | 0.3 | 2230.0 | 2429.0 | 91.8 | 46352.0 | 14931.0 | 0.5 | 8.0 | 156.0 | 5.0 | 0.0 | 103.0 | 2347.0 | 82.0 | 3.0 | 296.0 | 58.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1937.0 | 190.0 | 302.0 | 97.0 | 2125.0 | 126.0 | 12.0 | 13.0 | 5.0 | 14.0 | 13.0 | 25.0 | 0.71 | 17.0 | 0.0 | 2.0 | 1.0 | 1.0 | 6.0 | 0.17 | 0.0 | 45.0 | 27.0 | 29.0 | 15.0 | 1.0 | 54.1 | 17.0 | 87.0 | 31.9 | 2.0 | 37.0 | 77.0 | 156.0 | 2.0 | 2760.0 | 385.0 | 32.0 | 76.2 | 16.0 | 0.0 | 1915.0 | 1.0 | 11.0 | 9.0 | 1963.0 | 1889.0 | 96.2 | 90 | 92.1 | 90.0 | 35.0 | 0 | NaN | 1 | 2.43 | 86.0 | 30.0 | 56.0 | 1.60 | 2.27 | 69.9 | 35.6 | 34.2 | 0.98 | 0.0 | 24 | 1.0 | 0.0 | 0.0 | 473.0 | 91.0 | 33.0 | 73.4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9539.0 | Gerard Piqué | 2019/2020 | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | 18944 | adfc9123 | Barcelona | Spain | La Liga | DF | DF | Centre-Back | Defender | Outfielder | 32 | 21.0 | 1987-02-02 | 194.0 | right | Barcelona | Spain | Spain | Spain | NaN | 22500000.0 | 25000000.0 | 2008-07-01 | 13.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | 432946.0 | 22513250.0 | 22513250.0 | NaN | NaN | NaN | NaN | 35 | 35 | 3092.0 | 34.4 | 1 | 0 | 1 | 0 | 0 | 15 | 0 | 0.03 | 0.03 | 2.3 | 2.3 | 0.6 | 2.9 | 0.08 | 15.0 | 6.0 | 40.0 | 0.44 | 0.17 | 0.07 | 0.17 | 8.4 | 1.0 | 0.15 | -1.3 | -1.3 | 2469.0 | 2659.0 | 92.9 | 53752.0 | 14795.0 | -0.6 | 5.0 | 192.0 | 3.0 | 0.0 | 116.0 | 2548.0 | 111.0 | 2.0 | 275.0 | 79.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2121.0 | 227.0 | 311.0 | 113.0 | 2364.0 | 99.0 | 9.0 | 13.0 | 1.0 | 14.0 | 14.0 | 12.0 | 0.35 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.03 | 0.0 | 37.0 | 22.0 | 21.0 | 16.0 | 0.0 | 48.3 | 15.0 | 83.0 | 34.0 | 2.0 | 40.0 | 73.0 | 182.0 | 1.0 | 2996.0 | 427.0 | 28.0 | 100.0 | 9.0 | 0.0 | 2084.0 | 0.0 | 9.0 | 7.0 | 2211.0 | 2171.0 | 98.2 | 88 | 90.4 | 88.0 | 31.0 | 0 | NaN | 0 | 2.09 | 71.0 | 36.0 | 35.0 | 1.02 | -2.55 | 56.2 | 33.8 | 22.4 | 0.65 | 0.0 | 32 | 0.0 | 2.0 | 0.0 | 391.0 | 128.0 | 40.0 | 76.2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9623.0 | Gerard Piqué | 2020/2021 | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | 18944 | adfc9123 | Barcelona | Spain | La Liga | DF | DF | Centre-Back | Defender | Outfielder | 33 | 21.0 | 1987-02-02 | 194.0 | right | Barcelona | Spain | Spain | Spain | NaN | 13500000.0 | 15000000.0 | 2008-07-01 | 13.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | 22166.0 | 1152678.0 | 1152678.0 | NaN | NaN | NaN | NaN | 18 | 18 | 1481.0 | 16.5 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0.00 | 0.00 | 0.6 | 0.6 | 0.5 | 1.1 | 0.07 | 8.0 | 2.0 | 25.0 | 0.49 | 0.12 | 0.00 | 0.00 | 9.3 | 0.0 | 0.08 | -0.6 | -0.6 | 1173.0 | 1247.0 | 94.1 | 24743.0 | 6069.0 | -0.5 | 2.0 | 82.0 | 1.0 | 0.0 | 35.0 | 1204.0 | 43.0 | 0.0 | 91.0 | 32.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1033.0 | 93.0 | 121.0 | 72.0 | 1079.0 | 45.0 | 5.0 | 10.0 | 1.0 | 6.0 | 4.0 | 5.0 | 0.30 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.06 | 0.0 | 21.0 | 13.0 | 13.0 | 7.0 | 1.0 | 64.3 | 5.0 | 46.0 | 40.4 | 0.0 | 13.0 | 36.0 | 65.0 | 0.0 | 1368.0 | 161.0 | 14.0 | 100.0 | 1.0 | 0.0 | 961.0 | 0.0 | 2.0 | 3.0 | 1046.0 | 1015.0 | 97.0 | 82 | 43.3 | 82.0 | 13.0 | 0 | NaN | 1 | 1.53 | 34.0 | 22.0 | 12.0 | 0.73 | -0.90 | 32.9 | 16.1 | 16.8 | 1.02 | 0.0 | 13 | 0.0 | 0.0 | 0.0 | 182.0 | 66.0 | 21.0 | 75.9 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
NaN | Gerard Piqué | 2021/2022 | https://fbref.com/en/players/adfc9123/Gerard-P... | https://www.transfermarkt.com/gerard-pique/pro... | 18944 | adfc9123 | Barcelona | Spain | La Liga | DF | DF | Centre-Back | Defender | Outfielder | 34 | 21.0 | 1987-02-02 | 194.0 | right | Barcelona | Spain | Spain | Spain | NaN | 9000000.0 | 10000000.0 | 2008-07-01 | 13.0 | 2.0 | 2024-06-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 2 | 120.0 | 1.3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0.75 | 0.75 | 0.2 | 0.2 | 0.0 | 0.2 | 0.15 | 1.0 | 1.0 | 100.0 | 0.75 | 0.75 | 1.00 | 1.00 | 8.0 | 0.0 | 0.19 | 0.8 | 0.8 | 94.0 | 97.0 | 96.9 | 2030.0 | 415.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 2.0 | 89.0 | 8.0 | 0.0 | 9.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | 5.0 | 7.0 | 11.0 | 80.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.75 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | NaN | 0.0 | 2.0 | 25.0 | 0.0 | 2.0 | 4.0 | 10.0 | 0.0 | 112.0 | 24.0 | 1.0 | NaN | 0.0 | 0.0 | 69.0 | 0.0 | 0.0 | 0.0 | 82.0 | 79.0 | 96.3 | 60 | 44.4 | 60.0 | 1.0 | 0 | NaN | 0 | 2.00 | 4.0 | 2.0 | 2.0 | 1.50 | 0.90 | 3.3 | 1.5 | 1.8 | 1.35 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 9.0 | 5.0 | 1.0 | 83.3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# Export DataFrame as a CSV file
df_merge_fbref_tm_capology_select_notnull.to_csv(data_dir + f'/export/' + f'unified_fbref_tm_capology.csv', index=None, header=True)
This notebook joins the scraped and engineered player datasets including aggregated player performance data from FBref (provided by StatsBomb), TransferMarkt estimated player values and recorded transfer datasets, and player salaries dataset from Capology, through the record-linkage library, to create one, unified source of information, that can be used for for further analysis of players performance statistics and financial valuations.
These final datasets are now ready for any further analysis including modeling and data visualisation.
*Visit my website eddwebster.com or my GitHub Repository for more projects. If you'd like to get in contact, my Twitter handle is @eddwebster and my email is: edd.j.webster@gmail.com.*