I've been playing Rocket League (on console) for more than 5 years. Am I a Pro? NOPE! I'm an Average Joe in terms of rankings, but I love the competitiveness. Most of my friends still play the game so the social aspect is another reason I continue to play.

The game offers stats but it's an aggregate of all my lifetime stats. It's doesn't offer any sort of trends. I decided to record at least 500 games to track my performance over time. After each match concluded, I took a photo with my phone of the scoreboard, and manually plugged in the stats into an excel spreadsheet. #automation

I wanted to answer questions as simple as:

- After tracking Game 1 to Game 505, did I improve in terms of overall stats (Total Points)?
- What is my win/loss percentage trend?
- How has my offense improved (Goals and Assists per game)?
- How has my defense improved (Saves per game)?

In [1]:

```
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#show plots inline
%matplotlib inline
```

In [2]:

```
#read the file
rl_data = pd.read_csv('Rocket_League_Stats.csv')
```

We'll inspect the data using the following methods -

`df.info()`

`df.describe()`

`df.shape`

`df['strings'].value_counts()`

`df.head()`

&`df.tail()`

The methods will be explained within the comments section below!

In [3]:

```
#summary information about our index, columns, and memory usage
rl_data.info()
```

In [4]:

```
#determine the number of rows and columns in our df
rl_data.shape
```

Out[4]:

In [5]:

```
#statistical information per column (by default this method only includes numeric columns)
rl_data.describe(include='all')
```

Out[5]:

In [6]:

```
#returns the first 5 rows of our data
rl_data.head()
```

Out[6]:

In [7]:

```
#returns the last 5 rows of our data
rl_data.tail()
```

Out[7]:

In [8]:

```
#returns a series with the frequency per value
rl_data['Match Result'].value_counts()
```

Out[8]:

This is my personal dataset and I'm fairly confident it's "clean". However, I'm human prone to errors so it's best to confirm.

Currently, in the `Games`

column the values output reads as `Game 1`

, `Game 2`

, and so on. Luckily there is clear pattern - all values includes characters followed by digits. I will remove the non-digit characters, which starts with `Game`

and keep the digit. Then, I will convert the column to numeric dtypes.

I'll use the `Series.str.replace()`

method, which will remove all the quote characters. In addition, I'll use the `Series.str.split()`

method to strip whitespace from each string.

In [9]:

```
rl_data['Games'] = rl_data['Games'].str.replace('Game','').str.strip()
```

Now we can convert the columns to a numeric dtype. I'll use the `Series.astype()`

method. We don't need to store decimals so I'll select `int`

dtype.

Using `df.head()`

we'll inspect the first 5 rows along with `df.info()`

to confirm the column was successfully converted to `int`

dtype.

In [10]:

```
rl_data['Games'] = rl_data['Games'].astype(int)
```

In [11]:

```
rl_data.head()
```

Out[11]:

In [12]:

```
rl_data.info()
```

Success!

...for now

The `Match Result`

column is currently two values - `1`

equals won and `0`

which translates to loss. I want to create a new column, based on `Match Result`

values, with new string values `W`

and `L`

. I'll keep the original for reference.

I'll use `Series.map()`

method, but first I'll create a function. If the value is greater than 0, then return `W`

else return `L`

. I'll label the new column `Results W/L`

.

I'll inspect using `df.head()`

to confirm the returned results.

In [13]:

```
def label(element):
if element > 0:
return 'W'
else:
return 'L'
rl_data['Result_W/L'] = rl_data['Match Result'].map(label)
```

In [14]:

```
rl_data.head()
```

Out[14]:

Thinking ahead about how I want to explore the data using visualizations (more on that ahead), I want to create a new column that labels every 100 games in a certain bucket. For example, games 1-100 will be labeled as 0-100...and so on.

I'll use `Series.map()`

method again, but I must create my function will mulitple if statements.

In [15]:

```
def fn(x):
if x < 101:
return '0-100'
if 100 < x < 201:
return '101-200'
if 200 < x < 301:
return '201-300'
if 300 < x < 401:
return '301-400'
if 400 < x < 506:
return '401-505'
return 0
rl_data['Interval_Games'] = rl_data['Games'].map(fn)
```

I'll inspect using `df[Series].value_counts()`

because I need to confirm there is a total of 100 games per bucket and 105 values in 401-505.

In [16]:

```
rl_data['Interval_Games'].value_counts()
```

Out[16]:

In [17]:

```
rl_data.head()
```

Out[17]:

In [18]:

```
rl_data.tail()
```

Out[18]:

Great, this is working as intended!

In [19]:

```
plt.hist(rl_data['Points'])
plt.title('Distribution of Points')
plt.xlabel('Points')
plt.ylabel('Frequency')
plt.show()
```

Based on the frequency, looks like most of my games are between 350 and 500.

Earlier I crearted an interval for every 100 games. I wanted to see how my first 100 games compared to my last 105 (remember I logged 505 games in total), and everything in between. I'll continue using a histogram, but this time for each interval and total points.

In [20]:

```
g = sns.FacetGrid(rl_data , col = "Interval_Games", height=3.5, aspect=.75)
```

In [21]:

```
g.map(sns.histplot, "Points")
g.set_axis_labels("Points", "Frequency")
```

Out[21]:

Very interesting data! The closest chart to a symmertical distribution is interval 101-200. However, interval 401-505 stands out for a couple of reasons. First, sub 250 point games were more frequent. Second, higher scoring games were much more frequent as well.

In [22]:

```
sns.scatterplot(data=rl_data, x="Points", y="Goals", hue="Match Result", size="Match Result", sizes=(200,125))
```

Out[22]:

In [ ]:

```
```