In this tutorial we mainly address advanced data analytics and plotting.
We are going to use pandas [1] to do data manipulation and data analytics of the DataFrame
.
Finally, we are going to do advanced plotting using matplotlib [2] and seaborn [3].
%matplotlib inline
import pandas as pd
#import json
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
To load the csv file and use column 0 as the row labels of the DataFrame
, we proceed as follows,
#shot_df = pd.read_csv('test.csv')
#shot_df = pd.read_csv(filepath_or_buffer='test.csv')
shot_df = pd.read_csv(filepath_or_buffer='test.csv',index_col=0)
To display the information in shot_df
(we only display the first 10 rows), and force pandas to display all the columns, you can proceed as follows,
pd.set_option('display.max_columns', None)
shot_df.head(10)
GRID_TYPE | GAME_ID | GAME_EVENT_ID | PLAYER_ID | PLAYER_NAME | TEAM_ID | TEAM_NAME | PERIOD | MINUTES_REMAINING | SECONDS_REMAINING | EVENT_TYPE | ACTION_TYPE | SHOT_TYPE | SHOT_ZONE_BASIC | SHOT_ZONE_AREA | SHOT_ZONE_RANGE | SHOT_DISTANCE | LOC_X | LOC_Y | SHOT_ATTEMPTED_FLAG | SHOT_MADE_FLAG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Shot Chart Detail | 21400018 | 4 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 11 | 20 | Missed Shot | Jump Shot | 2PT Field Goal | Mid-Range | Right Side Center(RC) | 16-24 ft. | 18 | 114 | 148 | 1 | 0 |
1 | Shot Chart Detail | 21400018 | 33 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 6 | 30 | Made Shot | Layup Shot | 2PT Field Goal | Restricted Area | Center(C) | Less Than 8 ft. | 0 | -7 | 0 | 1 | 1 |
2 | Shot Chart Detail | 21400018 | 53 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 4 | 45 | Missed Shot | Fadeaway Jump Shot | 2PT Field Goal | Mid-Range | Left Side(L) | 8-16 ft. | 12 | -105 | 63 | 1 | 0 |
3 | Shot Chart Detail | 21400018 | 77 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 2 | 31 | Missed Shot | Jump Shot | 3PT Field Goal | Right Corner 3 | Right Side(R) | 24+ ft. | 22 | 227 | -16 | 1 | 0 |
4 | Shot Chart Detail | 21400018 | 82 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 1 | 51 | Missed Shot | Jump Shot | 3PT Field Goal | Above the Break 3 | Right Side Center(RC) | 24+ ft. | 26 | 91 | 246 | 1 | 0 |
5 | Shot Chart Detail | 21400018 | 136 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 2 | 9 | 18 | Missed Shot | Jump Bank Shot | 2PT Field Goal | In The Paint (Non-RA) | Right Side(R) | 8-16 ft. | 9 | 70 | 58 | 1 | 0 |
6 | Shot Chart Detail | 21400018 | 202 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 2 | 3 | 16 | Missed Shot | Jump Shot | 3PT Field Goal | Above the Break 3 | Right Side Center(RC) | 24+ ft. | 26 | 122 | 235 | 1 | 0 |
7 | Shot Chart Detail | 21400018 | 217 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 2 | 1 | 55 | Missed Shot | Reverse Layup Shot | 2PT Field Goal | Restricted Area | Center(C) | Less Than 8 ft. | 0 | -8 | 0 | 1 | 0 |
8 | Shot Chart Detail | 21400018 | 227 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 2 | 1 | 16 | Missed Shot | Turnaround Jump Shot | 2PT Field Goal | Mid-Range | Right Side(R) | 8-16 ft. | 13 | 135 | 22 | 1 | 0 |
9 | Shot Chart Detail | 21400018 | 299 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 3 | 6 | 54 | Made Shot | Jump Shot | 3PT Field Goal | Above the Break 3 | Center(C) | 24+ ft. | 25 | 26 | 249 | 1 | 1 |
We can create a new DataFrame
using the DataFrame shot_df
with some given columns, as follows,
shot_df1 = pd.DataFrame(shot_df, columns = ['PERIOD','SHOT_TYPE', 'SHOT_ZONE_BASIC', 'SHOT_MADE_FLAG', 'LOC_X', 'LOC_Y'])
To display the information in shot_df1
(we only display the first 5 rows),
shot_df1.head()
PERIOD | SHOT_TYPE | SHOT_ZONE_BASIC | SHOT_MADE_FLAG | LOC_X | LOC_Y | |
---|---|---|---|---|---|---|
0 | 1 | 2PT Field Goal | Mid-Range | 0 | 114 | 148 |
1 | 1 | 2PT Field Goal | Restricted Area | 1 | -7 | 0 |
2 | 1 | 2PT Field Goal | Mid-Range | 0 | -105 | 63 |
3 | 1 | 3PT Field Goal | Right Corner 3 | 0 | 227 | -16 |
4 | 1 | 3PT Field Goal | Above the Break 3 | 0 | 91 | 246 |
Let's create a few DataFrame
using shot_df
.
Notice that we are using strings and logical operators to create the new DataFrame
.
c1 = shot_df[(shot_df.SHOT_TYPE == '2PT Field Goal')]
c1c = shot_df[(shot_df.SHOT_TYPE == '2PT Field Goal') & (shot_df.SHOT_MADE_FLAG == 1)]
c1m = shot_df[(shot_df.SHOT_TYPE == '2PT Field Goal') & (shot_df.SHOT_MADE_FLAG == 0)]
c2 = shot_df[(shot_df.SHOT_TYPE == '3PT Field Goal')]
c2c = shot_df[(shot_df.SHOT_TYPE == '3PT Field Goal') & (shot_df.SHOT_MADE_FLAG == 1)]
c2m = shot_df[(shot_df.SHOT_TYPE == '3PT Field Goal') & (shot_df.SHOT_MADE_FLAG == 0)]
c3 = shot_df[(shot_df.SHOT_MADE_FLAG)]
c3c = shot_df[(shot_df.SHOT_MADE_FLAG == 1)]
c3m = shot_df[(shot_df.SHOT_MADE_FLAG == 0)]
To compute the dimension of the DataFrame c3
or the total field goals attempted,
len(c3.index)
1279
To compute the sum of the missed shots (c3m
) and converted shots (c3c
),
len(c3c.index) + len(c3m.index)
1279
At this point, we can compute the statistics of the DataFrame
we just created,
#shot_df1.LOC_X[(shot_df.SHOT_MADE_FLAG == 1)].sum()
shot_df1.sum()
PERIOD 3131 SHOT_TYPE 2PT Field Goal2PT Field Goal2PT Field Goal3PT ... SHOT_ZONE_BASIC Mid-RangeRestricted AreaMid-RangeRight Corner ... SHOT_MADE_FLAG 624 LOC_X -17903 LOC_Y 107355 dtype: object
or we can compute the sum of a single column,
#shot_df1.LOC_X[(shot_df.SHOT_MADE_FLAG == 1)].sum()
shot_df1.LOC_X.sum()
-17903
To compute the mean of a DataFrame
,
shot_df1.LOC_X[(shot_df1.SHOT_MADE_FLAG == 1)].mean()
-8.434294871794872
To compute the cummulative sum of a DataFrame
,
#shot_df1.LOC_Y[(shot_df.SHOT_MADE_FLAG == 1)].cumsum()
shot_df1.LOC_Y[(shot_df1.SHOT_MADE_FLAG == 1)].cumsum().tail()
1269 38243 1270 38468 1271 38471 1273 38674 1275 38887 Name: LOC_Y, dtype: int64
To count the number of non-NA values,
shot_df1.count()
PERIOD 1279 SHOT_TYPE 1279 SHOT_ZONE_BASIC 1279 SHOT_MADE_FLAG 1279 LOC_X 1279 LOC_Y 1279 dtype: int64
To compute the minimum value of a DataFrame
,
shot_df1.min()
PERIOD 1 SHOT_TYPE 2PT Field Goal SHOT_ZONE_BASIC Above the Break 3 SHOT_MADE_FLAG 0 LOC_X -245 LOC_Y -30 dtype: object
To compute the maximum value of a DataFrame
,
shot_df1.max()
PERIOD 5 SHOT_TYPE 3PT Field Goal SHOT_ZONE_BASIC Right Corner 3 SHOT_MADE_FLAG 1 LOC_X 241 LOC_Y 418 dtype: object
To compute the median of a DataFrame
,
shot_df1.LOC_Y.median()
42.0
To compute the standard deviation of a DataFrame
,
shot_df1.LOC_Y.std()
91.99685249349442
To compute the variance of a DataFrame
,
shot_df1.LOC_Y.var()
8463.420868709769
To compute the skewness of a DataFrame
,
shot_df1.LOC_Y.skew()
0.81651481685383376
To compute the kurtosis of a DataFrame
,
shot_df1.LOC_X[(shot_df1.LOC_Y > 0)].kurt()
-0.38142873041066466
To compute the correlation matrix of a DataFrame
,
shot_df1.corr()
PERIOD | SHOT_MADE_FLAG | LOC_X | LOC_Y | |
---|---|---|---|---|
PERIOD | 1.000000 | 0.008868 | -0.028171 | 0.101323 |
SHOT_MADE_FLAG | 0.008868 | 1.000000 | 0.052099 | -0.229445 |
LOC_X | -0.028171 | 0.052099 | 1.000000 | 0.007676 |
LOC_Y | 0.101323 | -0.229445 | 0.007676 | 1.000000 |
To compute a summary of the statistics of a DataFrame
,
shot_df1.describe()
PERIOD | SHOT_MADE_FLAG | LOC_X | LOC_Y | |
---|---|---|---|---|
count | 1279.000000 | 1279.000000 | 1279.000000 | 1279.000000 |
mean | 2.448006 | 0.487881 | -13.997654 | 83.936669 |
std | 1.137046 | 0.500049 | 104.266921 | 91.996852 |
min | 1.000000 | 0.000000 | -245.000000 | -30.000000 |
25% | 1.000000 | 0.000000 | -94.000000 | 4.000000 |
50% | 2.000000 | 0.000000 | -2.000000 | 42.000000 |
75% | 3.000000 | 1.000000 | 22.000000 | 164.000000 |
max | 5.000000 | 1.000000 | 241.000000 | 418.000000 |
We can also group data in a DataFrame
using groupby
. In this example, we want to group the information in the DataFrame shot_by
by SHOT_ZONE_AREA
#gb=shot_df.groupby('SHOT_ZONE_AREA','SHOT_ATTEMPTED_FLAG')
gb=shot_df.groupby('SHOT_ZONE_AREA')
which has the following type,
type(gb)
pandas.core.groupby.DataFrameGroupBy
To know the size of the DataFrameGroupBy
(and the names of the groups),
gb.size()
SHOT_ZONE_AREA Back Court(BC) 3 Center(C) 670 Left Side Center(LC) 177 Left Side(L) 181 Right Side Center(RC) 149 Right Side(R) 99 dtype: int64
We can use list()
to view what that grouping looks like,
#The line is commented as it prints a lot information,
#list(gb)
Now we can apply an operation to the group we just created. In this case we apply the operation to only one column,
#gb.describe()
#gb.SHOT_ATTEMPTED_FLAG.describe()
#gb['SHOT_ATTEMPTED_FLAG'].describe()
gb['SHOT_ATTEMPTED_FLAG'].describe().unstack()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
SHOT_ZONE_AREA | ||||||||
Back Court(BC) | 3 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Center(C) | 670 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Left Side Center(LC) | 177 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Left Side(L) | 181 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Right Side Center(RC) | 149 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Right Side(R) | 99 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
The unstack()
option does kind of a pretty print.
You can also print the DataFrame
without the unstack()
option.
Also, have in mind that when you apply an operation to a DataFrameGroupBy
, it will return a DataFrame
type(gb['SHOT_ATTEMPTED_FLAG'].describe())
pandas.core.series.Series
We can group data using more than one variable,
gb1 = shot_df.groupby(['SHOT_ZONE_AREA','PERIOD'])
To print the names of the groups in DataFrameGroupBy
and their respectives size,
gb1.size()
SHOT_ZONE_AREA PERIOD Back Court(BC) 2 3 Center(C) 1 203 2 182 3 143 4 136 5 6 Left Side Center(LC) 1 29 2 44 3 54 4 48 5 2 Left Side(L) 1 54 2 37 3 51 4 36 5 3 Right Side Center(RC) 1 33 2 32 3 43 4 40 5 1 Right Side(R) 1 27 2 26 3 20 4 26 dtype: int64
To compute the summary of statistics of the group gb1
,
pd.set_option('display.max_rows', None)
#gb1.describe()
gb1.describe().unstack()
GAME_EVENT_ID | GAME_ID | LOC_X | LOC_Y | MINUTES_REMAINING | PLAYER_ID | SECONDS_REMAINING | SHOT_ATTEMPTED_FLAG | SHOT_DISTANCE | SHOT_MADE_FLAG | TEAM_ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | ||
SHOT_ZONE_AREA | PERIOD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Back Court(BC) | 2 | 3 | 262.333333 | 13.576941 | 254 | 254.50 | 255.0 | 266.50 | 278 | 3 | 21400622.666667 | 361.950733 | 21400211 | 21400488.50 | 21400766.0 | 21400828.50 | 21400891 | 3 | -36.000000 | 180.385698 | -234 | -113.50 | 7.0 | 63.00 | 119 | 3 | 408.333333 | 8.504901 | 402 | 403.50 | 405.0 | 411.50 | 418 | 3 | 0.000000 | 0.000000 | 0 | 0.00 | 0.0 | 0.00 | 0 | 3 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 3 | 0.000000 | 0.000000 | 0 | 0.00 | 0.0 | 0.00 | 0 | 3 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 3 | 43.000000 | 3.000000 | 40 | 41.50 | 43.0 | 44.50 | 46 | 3 | 0.000000 | 0.000000 | 0 | 0.00 | 0.0 | 0.00 | 0 | 3 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
Center(C) | 1 | 203 | 56.103448 | 39.394478 | 2 | 21.50 | 47.0 | 91.00 | 154 | 203 | 21400595.044335 | 339.873321 | 21400018 | 21400292.00 | 21400658.0 | 21400854.00 | 21401203 | 203 | -1.295567 | 24.519972 | -84 | -11.00 | 0.0 | 6.00 | 78 | 203 | 45.463054 | 80.599870 | -13 | 1.00 | 7.0 | 44.50 | 291 | 203 | 5.630542 | 3.594062 | 0 | 2.00 | 6.0 | 9.00 | 11 | 203 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 203 | 26.275862 | 17.907152 | 0 | 9.00 | 27.0 | 40.00 | 59 | 203 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 203 | 4.896552 | 7.894988 | 0 | 0.00 | 1.0 | 5.00 | 29 | 203 | 0.596059 | 0.491899 | 0 | 0.00 | 1.0 | 1.00 | 1 | 203 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
2 | 182 | 195.423077 | 45.324881 | 107 | 156.50 | 195.0 | 236.25 | 292 | 182 | 21400600.357143 | 337.699397 | 21400018 | 21400308.00 | 21400643.0 | 21400883.00 | 21401203 | 182 | -0.362637 | 20.689238 | -87 | -7.00 | 0.0 | 9.00 | 67 | 182 | 45.587912 | 80.545999 | -17 | 1.00 | 9.0 | 38.50 | 373 | 182 | 4.961538 | 3.629781 | 0 | 1.25 | 5.0 | 8.00 | 11 | 182 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 182 | 27.082418 | 17.154766 | 0 | 11.00 | 29.0 | 41.00 | 58 | 182 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 182 | 4.747253 | 7.921010 | 0 | 0.00 | 1.0 | 5.00 | 37 | 182 | 0.604396 | 0.490329 | 0 | 0.00 | 1.0 | 1.00 | 1 | 182 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
3 | 143 | 324.566434 | 45.495327 | 227 | 288.00 | 322.0 | 357.50 | 448 | 143 | 21400552.496503 | 351.709585 | 21400018 | 21400238.00 | 21400453.0 | 21400854.00 | 21401203 | 143 | -2.153846 | 24.325639 | -65 | -15.00 | 0.0 | 6.00 | 77 | 143 | 51.076923 | 84.421401 | -19 | 1.00 | 11.0 | 57.00 | 301 | 143 | 5.195804 | 3.458524 | 0 | 2.00 | 5.0 | 8.00 | 11 | 143 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 143 | 30.202797 | 16.671456 | 0 | 17.50 | 30.0 | 44.00 | 59 | 143 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 143 | 5.426573 | 8.281567 | 0 | 0.00 | 1.0 | 6.00 | 30 | 143 | 0.636364 | 0.482737 | 0 | 0.00 | 1.0 | 1.00 | 1 | 143 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
4 | 136 | 450.992647 | 47.864701 | 339 | 420.75 | 451.0 | 486.25 | 581 | 136 | 21400645.411765 | 363.907934 | 21400018 | 21400308.00 | 21400700.0 | 21400995.00 | 21401203 | 136 | -5.419118 | 26.283513 | -76 | -11.50 | 0.0 | 2.00 | 69 | 136 | 52.029412 | 84.447392 | -11 | 1.00 | 11.5 | 44.25 | 306 | 136 | 5.455882 | 3.435906 | 0 | 3.00 | 5.5 | 8.00 | 11 | 136 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 136 | 30.345588 | 16.882767 | 1 | 15.75 | 30.0 | 44.25 | 59 | 136 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 136 | 5.602941 | 8.324243 | 0 | 0.00 | 1.0 | 6.00 | 30 | 136 | 0.588235 | 0.493972 | 0 | 0.00 | 1.0 | 1.00 | 1 | 136 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
5 | 6 | 577.666667 | 46.659047 | 522 | 540.75 | 577.5 | 618.75 | 628 | 6 | 21400452.500000 | 471.589122 | 21400022 | 21400022.00 | 21400452.5 | 21400883.00 | 21400883 | 6 | 10.000000 | 14.791890 | -19 | 10.50 | 14.5 | 19.25 | 20 | 6 | 50.333333 | 95.088730 | 6 | 7.25 | 11.0 | 20.75 | 244 | 6 | 2.000000 | 1.414214 | 0 | 1.25 | 2.0 | 2.75 | 4 | 6 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 6 | 40.833333 | 10.515069 | 20 | 43.00 | 43.0 | 46.00 | 49 | 6 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 6 | 5.500000 | 9.071935 | 1 | 2.00 | 2.0 | 2.00 | 24 | 6 | 0.500000 | 0.547723 | 0 | 0.00 | 0.5 | 1.00 | 1 | 6 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
Left Side Center(LC) | 1 | 29 | 51.620690 | 40.438148 | 4 | 21.00 | 35.0 | 76.00 | 144 | 29 | 21400619.517241 | 330.169288 | 21400125 | 21400336.00 | 21400620.0 | 21400839.00 | 21401178 | 29 | -149.827586 | 40.551615 | -226 | -166.00 | -144.0 | -119.00 | -82 | 29 | 182.896552 | 41.950945 | 105 | 148.00 | 184.0 | 217.00 | 255 | 29 | 6.275862 | 3.514222 | 0 | 4.00 | 7.0 | 9.00 | 11 | 29 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 29 | 28.034483 | 16.774732 | 0 | 16.00 | 29.0 | 43.00 | 52 | 29 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 29 | 23.689655 | 2.867346 | 16 | 24.00 | 25.0 | 25.00 | 28 | 29 | 0.448276 | 0.506120 | 0 | 0.00 | 0.0 | 1.00 | 1 | 29 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
2 | 44 | 202.000000 | 44.015853 | 117 | 163.75 | 207.0 | 238.00 | 274 | 44 | 21400591.386364 | 328.073603 | 21400055 | 21400315.25 | 21400623.0 | 21400829.25 | 21401092 | 44 | -152.454545 | 42.163279 | -239 | -188.50 | -147.0 | -126.00 | -64 | 44 | 183.181818 | 39.273799 | 97 | 153.25 | 189.5 | 213.75 | 257 | 44 | 4.318182 | 3.838439 | 0 | 1.00 | 4.0 | 8.25 | 11 | 44 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 44 | 26.681818 | 17.614440 | 0 | 9.50 | 28.0 | 40.50 | 58 | 44 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 44 | 23.795455 | 2.681370 | 17 | 24.00 | 25.0 | 25.00 | 28 | 44 | 0.295455 | 0.461522 | 0 | 0.00 | 0.0 | 1.00 | 1 | 44 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
3 | 54 | 322.277778 | 45.913543 | 248 | 290.75 | 314.0 | 342.00 | 456 | 54 | 21400668.277778 | 340.890256 | 21400066 | 21400374.00 | 21400709.0 | 21400907.00 | 21401203 | 54 | -135.537037 | 31.051213 | -215 | -152.25 | -131.0 | -112.50 | -70 | 54 | 190.833333 | 42.681687 | 101 | 151.50 | 203.0 | 219.00 | 304 | 54 | 5.888889 | 3.467731 | 0 | 4.00 | 7.0 | 9.00 | 11 | 54 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 54 | 28.222222 | 17.641789 | 0 | 12.25 | 30.0 | 42.00 | 59 | 54 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 54 | 23.203704 | 3.530886 | 16 | 20.00 | 24.0 | 25.00 | 32 | 54 | 0.425926 | 0.499126 | 0 | 0.00 | 0.0 | 1.00 | 1 | 54 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
4 | 48 | 456.229167 | 49.689350 | 355 | 419.50 | 452.5 | 490.25 | 551 | 48 | 21400611.729167 | 336.998626 | 21400018 | 21400418.25 | 21400667.0 | 21400887.25 | 21401148 | 48 | -142.583333 | 43.279809 | -245 | -177.00 | -131.0 | -110.50 | -75 | 48 | 195.666667 | 43.360602 | 88 | 157.00 | 210.0 | 230.25 | 265 | 48 | 5.395833 | 3.227143 | 0 | 3.00 | 5.0 | 8.00 | 11 | 48 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 48 | 30.687500 | 16.572415 | 1 | 17.25 | 32.0 | 43.25 | 56 | 48 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 48 | 24.312500 | 2.918439 | 16 | 24.00 | 25.0 | 26.00 | 28 | 48 | 0.354167 | 0.483321 | 0 | 0.00 | 0.0 | 1.00 | 1 | 48 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
5 | 2 | 579.500000 | 45.961941 | 547 | 563.25 | 579.5 | 595.75 | 612 | 2 | 21400923.500000 | 57.275649 | 21400883 | 21400903.25 | 21400923.5 | 21400943.75 | 21400964 | 2 | -117.000000 | 33.941125 | -141 | -129.00 | -117.0 | -105.00 | -93 | 2 | 216.500000 | 26.162951 | 198 | 207.25 | 216.5 | 225.75 | 235 | 2 | 1.500000 | 0.707107 | 1 | 1.25 | 1.5 | 1.75 | 2 | 2 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 2 | 43.000000 | 15.556349 | 32 | 37.50 | 43.0 | 48.50 | 54 | 2 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 24.500000 | 0.707107 | 24 | 24.25 | 24.5 | 24.75 | 25 | 2 | 0.500000 | 0.707107 | 0 | 0.25 | 0.5 | 0.75 | 1 | 2 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
Left Side(L) | 1 | 54 | 55.314815 | 42.385840 | 2 | 19.75 | 41.0 | 93.50 | 144 | 54 | 21400598.314815 | 375.380452 | 21400018 | 21400241.75 | 21400642.0 | 21400897.75 | 21401203 | 54 | -141.296296 | 45.644496 | -235 | -160.00 | -137.0 | -105.75 | -48 | 54 | 38.888889 | 32.251176 | -22 | 15.50 | 36.5 | 57.50 | 127 | 54 | 5.666667 | 3.865669 | 0 | 2.00 | 7.0 | 9.00 | 11 | 54 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 54 | 29.425926 | 17.608533 | 0 | 15.00 | 34.0 | 42.75 | 58 | 54 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 54 | 14.629630 | 4.057587 | 8 | 12.00 | 14.0 | 17.50 | 23 | 54 | 0.296296 | 0.460911 | 0 | 0.00 | 0.0 | 1.00 | 1 | 54 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
2 | 37 | 178.567568 | 48.461061 | 104 | 134.00 | 173.0 | 211.00 | 278 | 37 | 21400590.891892 | 376.121344 | 21400022 | 21400199.00 | 21400658.0 | 21400915.00 | 21401178 | 37 | -141.108108 | 47.363713 | -235 | -168.00 | -152.0 | -100.00 | -45 | 37 | 41.162162 | 32.009125 | -6 | 19.00 | 37.0 | 63.00 | 115 | 37 | 6.459459 | 3.420482 | 0 | 4.00 | 7.0 | 9.00 | 11 | 37 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 37 | 27.891892 | 15.327506 | 1 | 16.00 | 27.0 | 39.00 | 57 | 37 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 37 | 14.567568 | 4.675828 | 8 | 10.00 | 15.0 | 19.00 | 23 | 37 | 0.351351 | 0.483978 | 0 | 0.00 | 0.0 | 1.00 | 1 | 37 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
3 | 51 | 326.823529 | 50.073428 | 237 | 285.00 | 324.0 | 362.50 | 448 | 51 | 21400555.392157 | 342.813540 | 21400018 | 21400224.50 | 21400453.0 | 21400815.00 | 21401203 | 51 | -152.901961 | 43.258412 | -235 | -183.00 | -147.0 | -118.00 | -78 | 51 | 41.882353 | 35.894650 | -10 | 7.00 | 41.0 | 71.50 | 110 | 51 | 5.529412 | 3.951470 | 0 | 2.00 | 5.0 | 9.00 | 11 | 51 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 51 | 26.764706 | 17.214631 | 0 | 12.00 | 27.0 | 40.50 | 59 | 51 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 51 | 15.764706 | 3.982905 | 8 | 13.00 | 15.0 | 18.50 | 24 | 51 | 0.352941 | 0.482640 | 0 | 0.00 | 0.0 | 1.00 | 1 | 51 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
4 | 36 | 445.972222 | 40.500784 | 359 | 418.50 | 455.5 | 474.00 | 524 | 36 | 21400649.722222 | 376.531434 | 21400055 | 21400343.25 | 21400642.0 | 21400995.00 | 21401203 | 36 | -146.027778 | 45.930684 | -242 | -171.00 | -147.5 | -114.00 | -70 | 36 | 43.277778 | 35.611885 | -13 | 16.25 | 40.5 | 58.50 | 124 | 36 | 5.527778 | 3.517191 | 0 | 2.00 | 5.0 | 9.00 | 11 | 36 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 36 | 29.583333 | 18.885179 | 0 | 12.25 | 30.0 | 46.25 | 58 | 36 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 36 | 15.138889 | 4.599085 | 8 | 11.75 | 15.0 | 18.00 | 24 | 36 | 0.388889 | 0.494413 | 0 | 0.00 | 0.0 | 1.00 | 1 | 36 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
5 | 3 | 571.000000 | 21.166010 | 547 | 563.00 | 579.0 | 583.00 | 587 | 3 | 21400022.000000 | 0.000000 | 21400022 | 21400022.00 | 21400022.0 | 21400022.00 | 21400022 | 3 | -132.000000 | 38.000000 | -164 | -153.00 | -142.0 | -116.00 | -90 | 3 | 67.333333 | 57.134345 | 4 | 43.50 | 83.0 | 99.00 | 115 | 3 | 1.000000 | 1.732051 | 0 | 0.00 | 0.0 | 1.50 | 3 | 3 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 3 | 27.000000 | 22.649503 | 6 | 15.00 | 24.0 | 37.50 | 51 | 3 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 3 | 15.333333 | 1.154701 | 14 | 15.00 | 16.0 | 16.00 | 16 | 3 | 0.000000 | 0.000000 | 0 | 0.00 | 0.0 | 0.00 | 0 | 3 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
Right Side Center(RC) | 1 | 33 | 58.606061 | 36.952621 | 2 | 34.00 | 55.0 | 82.00 | 128 | 33 | 21400381.636364 | 323.949535 | 21400018 | 21400133.00 | 21400273.0 | 21400570.00 | 21401167 | 33 | 138.696970 | 36.666133 | 78 | 113.00 | 141.0 | 157.00 | 220 | 33 | 192.939394 | 44.687764 | 124 | 154.00 | 200.0 | 219.00 | 313 | 33 | 5.000000 | 3.526684 | 0 | 1.00 | 5.0 | 7.00 | 11 | 33 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 33 | 26.757576 | 16.414609 | 0 | 16.00 | 26.0 | 37.00 | 59 | 33 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 33 | 23.696970 | 3.566203 | 17 | 21.00 | 25.0 | 26.00 | 33 | 33 | 0.242424 | 0.435194 | 0 | 0.00 | 0.0 | 0.00 | 1 | 33 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
2 | 32 | 194.187500 | 45.258710 | 112 | 160.75 | 198.5 | 226.50 | 282 | 32 | 21400646.218750 | 313.708179 | 21400018 | 21400443.75 | 21400697.0 | 21400885.00 | 21401078 | 32 | 137.218750 | 39.067959 | 67 | 109.50 | 128.0 | 159.25 | 223 | 32 | 192.406250 | 58.923055 | 108 | 155.50 | 191.0 | 220.25 | 377 | 32 | 5.031250 | 3.864515 | 0 | 1.00 | 6.0 | 8.00 | 11 | 32 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 32 | 19.968750 | 16.395287 | 0 | 5.75 | 16.5 | 31.50 | 58 | 32 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 32 | 23.593750 | 5.375328 | 16 | 20.00 | 24.0 | 25.25 | 41 | 32 | 0.343750 | 0.482559 | 0 | 0.00 | 0.0 | 1.00 | 1 | 32 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
3 | 43 | 336.348837 | 43.759562 | 249 | 310.00 | 338.0 | 364.50 | 416 | 43 | 21400671.279070 | 310.432519 | 21400066 | 21400408.50 | 21400766.0 | 21400883.00 | 21401203 | 43 | 148.651163 | 38.434783 | 77 | 123.50 | 147.0 | 175.00 | 223 | 43 | 187.883721 | 43.155979 | 108 | 159.50 | 192.0 | 215.00 | 277 | 43 | 4.046512 | 3.272930 | 0 | 1.00 | 4.0 | 6.50 | 11 | 43 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 43 | 28.953488 | 20.499656 | 0 | 9.50 | 34.0 | 44.00 | 58 | 43 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 43 | 23.883721 | 3.325295 | 16 | 24.00 | 25.0 | 26.00 | 30 | 43 | 0.441860 | 0.502486 | 0 | 0.00 | 0.0 | 1.00 | 1 | 43 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
4 | 40 | 459.050000 | 46.408968 | 370 | 421.75 | 464.0 | 487.00 | 557 | 40 | 21400638.225000 | 281.721675 | 21400098 | 21400425.75 | 21400639.0 | 21400861.25 | 21401167 | 40 | 132.150000 | 39.003320 | 67 | 103.75 | 128.5 | 148.00 | 218 | 40 | 190.975000 | 53.965082 | 108 | 144.00 | 200.5 | 230.75 | 361 | 40 | 5.225000 | 3.555332 | 0 | 2.75 | 5.5 | 8.00 | 11 | 40 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 40 | 28.300000 | 16.469397 | 2 | 14.00 | 28.0 | 42.00 | 59 | 40 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 40 | 23.200000 | 4.569688 | 16 | 19.75 | 25.0 | 26.00 | 39 | 40 | 0.425000 | 0.500641 | 0 | 0.00 | 0.0 | 1.00 | 1 | 40 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
5 | 1 | 558.000000 | NaN | 558 | 558.00 | 558.0 | 558.00 | 558 | 1 | 21400964.000000 | NaN | 21400964 | 21400964.00 | 21400964.0 | 21400964.00 | 21400964 | 1 | 176.000000 | NaN | 176 | 176.00 | 176.0 | 176.00 | 176 | 1 | 186.000000 | NaN | 186 | 186.00 | 186.0 | 186.00 | 186 | 1 | 0.000000 | NaN | 0 | 0.00 | 0.0 | 0.00 | 0 | 1 | 2544 | NaN | 2544 | 2544 | 2544 | 2544 | 2544 | 1 | 32.000000 | NaN | 32 | 32.00 | 32.0 | 32.00 | 32 | 1 | 1 | NaN | 1 | 1 | 1 | 1 | 1 | 1 | 25.000000 | NaN | 25 | 25.00 | 25.0 | 25.00 | 25 | 1 | 1.000000 | NaN | 1 | 1.00 | 1.0 | 1.00 | 1 | 1 | 1610612739 | NaN | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
Right Side(R) | 1 | 27 | 51.814815 | 32.643809 | 6 | 24.00 | 46.0 | 76.50 | 116 | 27 | 21400498.296296 | 323.769720 | 21400018 | 21400226.00 | 21400433.0 | 21400776.00 | 21401124 | 27 | 166.444444 | 53.197985 | 70 | 127.00 | 162.0 | 227.50 | 241 | 27 | 38.370370 | 36.061004 | -16 | 8.00 | 31.0 | 68.00 | 116 | 27 | 5.703704 | 3.291187 | 0 | 3.50 | 6.0 | 8.00 | 11 | 27 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 27 | 30.629630 | 16.699028 | 1 | 22.50 | 31.0 | 38.00 | 59 | 27 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 27 | 17.222222 | 4.551697 | 9 | 14.00 | 16.0 | 22.00 | 24 | 27 | 0.333333 | 0.480384 | 0 | 0.00 | 0.0 | 1.00 | 1 | 27 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
2 | 26 | 196.961538 | 47.785756 | 129 | 153.75 | 197.5 | 226.25 | 301 | 26 | 21400670.923077 | 389.014542 | 21400018 | 21400286.50 | 21400714.5 | 21401023.00 | 21401178 | 26 | 136.000000 | 60.099251 | 62 | 76.25 | 135.0 | 172.50 | 239 | 26 | 50.000000 | 28.223395 | -6 | 28.50 | 55.0 | 66.75 | 101 | 26 | 4.884615 | 3.502307 | 0 | 2.00 | 5.0 | 8.00 | 11 | 26 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 26 | 28.500000 | 19.146279 | 0 | 12.50 | 30.5 | 39.75 | 59 | 26 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 26 | 14.461538 | 5.558638 | 8 | 10.00 | 13.5 | 17.75 | 24 | 26 | 0.230769 | 0.429669 | 0 | 0.00 | 0.0 | 0.00 | 1 | 26 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
3 | 20 | 321.450000 | 40.158207 | 261 | 296.50 | 316.5 | 347.25 | 426 | 20 | 21400700.600000 | 329.374751 | 21400098 | 21400582.75 | 21400733.0 | 21400934.00 | 21401203 | 20 | 148.750000 | 57.570895 | 67 | 100.25 | 150.0 | 190.50 | 239 | 20 | 56.800000 | 35.402349 | -5 | 40.00 | 54.5 | 77.75 | 118 | 20 | 5.350000 | 3.313052 | 0 | 2.75 | 5.0 | 8.25 | 11 | 20 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 20 | 29.500000 | 17.024750 | 3 | 17.75 | 25.5 | 39.50 | 59 | 20 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 20 | 15.850000 | 5.431632 | 8 | 10.50 | 16.0 | 21.25 | 23 | 20 | 0.450000 | 0.510418 | 0 | 0.00 | 0.0 | 1.00 | 1 | 20 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | |
4 | 26 | 446.576923 | 37.038545 | 367 | 419.00 | 452.5 | 467.25 | 527 | 26 | 21400685.884615 | 325.674602 | 21400211 | 21400319.75 | 21400667.0 | 21401023.00 | 21401167 | 26 | 132.461538 | 59.717489 | 45 | 90.25 | 108.0 | 168.25 | 241 | 26 | 38.807692 | 31.304976 | -30 | 20.00 | 40.5 | 56.75 | 112 | 26 | 5.846154 | 3.294751 | 0 | 3.25 | 5.0 | 9.00 | 11 | 26 | 2544 | 0 | 2544 | 2544 | 2544 | 2544 | 2544 | 26 | 30.192308 | 13.608877 | 7 | 19.00 | 28.5 | 41.75 | 51 | 26 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 26 | 13.961538 | 5.325266 | 8 | 10.00 | 12.0 | 16.75 | 24 | 26 | 0.423077 | 0.503831 | 0 | 0.00 | 0.0 | 1.00 | 1 | 26 | 1610612739 | 0 | 1610612739 | 1610612739 | 1610612739 | 1610612739 | 1610612739 |
Let's create another group,
gb2 = shot_df.groupby('SHOT_ZONE_AREA')
and extract a specific sub-group of the DataFrameGroupBy
as follows,
gb2.get_group('Center(C)').head(2)
GRID_TYPE | GAME_ID | GAME_EVENT_ID | PLAYER_ID | PLAYER_NAME | TEAM_ID | TEAM_NAME | PERIOD | MINUTES_REMAINING | SECONDS_REMAINING | EVENT_TYPE | ACTION_TYPE | SHOT_TYPE | SHOT_ZONE_BASIC | SHOT_ZONE_AREA | SHOT_ZONE_RANGE | SHOT_DISTANCE | LOC_X | LOC_Y | SHOT_ATTEMPTED_FLAG | SHOT_MADE_FLAG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Shot Chart Detail | 21400018 | 33 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 1 | 6 | 30 | Made Shot | Layup Shot | 2PT Field Goal | Restricted Area | Center(C) | Less Than 8 ft. | 0 | -7 | 0 | 1 | 1 |
7 | Shot Chart Detail | 21400018 | 217 | 2544 | LeBron James | 1610612739 | Cleveland Cavaliers | 2 | 1 | 55 | Missed Shot | Reverse Layup Shot | 2PT Field Goal | Restricted Area | Center(C) | Less Than 8 ft. | 0 | -8 | 0 | 1 | 0 |
To compute the number of entries in the group gb1
,
#gb1.PERIOD.count()
gb1.PERIOD.count().unstack()
PERIOD | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
SHOT_ZONE_AREA | |||||
Back Court(BC) | NaN | 3 | NaN | NaN | NaN |
Center(C) | 203 | 182 | 143 | 136 | 6 |
Left Side Center(LC) | 29 | 44 | 54 | 48 | 2 |
Left Side(L) | 54 | 37 | 51 | 36 | 3 |
Right Side Center(RC) | 33 | 32 | 43 | 40 | 1 |
Right Side(R) | 27 | 26 | 20 | 26 | NaN |
To compute the number of entries in the group gb1
you can also proceed in the following way,
#gb1.size()
gb1.size().unstack()
PERIOD | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
SHOT_ZONE_AREA | |||||
Back Court(BC) | NaN | 3 | NaN | NaN | NaN |
Center(C) | 203 | 182 | 143 | 136 | 6 |
Left Side Center(LC) | 29 | 44 | 54 | 48 | 2 |
Left Side(L) | 54 | 37 | 51 | 36 | 3 |
Right Side Center(RC) | 33 | 32 | 43 | 40 | 1 |
Right Side(R) | 27 | 26 | 20 | 26 | NaN |
To compute the summary of statistics of the group gb1
for the column PERIOD
,
#gb1.PERIOD.describe()
gb1.PERIOD.describe().unstack()
count | mean | std | min | 25% | 50% | 75% | max | ||
---|---|---|---|---|---|---|---|---|---|
SHOT_ZONE_AREA | PERIOD | ||||||||
Back Court(BC) | 2 | 3 | 2 | 0 | 2 | 2 | 2 | 2 | 2 |
Center(C) | 1 | 203 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
2 | 182 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
3 | 143 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | |
4 | 136 | 4 | 0 | 4 | 4 | 4 | 4 | 4 | |
5 | 6 | 5 | 0 | 5 | 5 | 5 | 5 | 5 | |
Left Side Center(LC) | 1 | 29 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
2 | 44 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
3 | 54 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | |
4 | 48 | 4 | 0 | 4 | 4 | 4 | 4 | 4 | |
5 | 2 | 5 | 0 | 5 | 5 | 5 | 5 | 5 | |
Left Side(L) | 1 | 54 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
2 | 37 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
3 | 51 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | |
4 | 36 | 4 | 0 | 4 | 4 | 4 | 4 | 4 | |
5 | 3 | 5 | 0 | 5 | 5 | 5 | 5 | 5 | |
Right Side Center(RC) | 1 | 33 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
2 | 32 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
3 | 43 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | |
4 | 40 | 4 | 0 | 4 | 4 | 4 | 4 | 4 | |
5 | 1 | 5 | NaN | 5 | 5 | 5 | 5 | 5 | |
Right Side(R) | 1 | 27 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
2 | 26 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
3 | 20 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | |
4 | 26 | 4 | 0 | 4 | 4 | 4 | 4 | 4 |
There are many ways to plot the data in a pandas DataFrame
, let's use first matplotlib.
Let's plot all the field attempts,
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("white")
sns.set_color_codes()
plt.figure(figsize=(10,10))
plt.scatter( shot_df.LOC_X, shot_df.LOC_Y,s=20,marker='o',alpha=1)
plt.xlim(-300,300)
plt.ylim(-100,500)
plt.title('Lebron James shot chart 2014-2015 \n All field attempts', y = 1.02, fontsize=20)
plt.grid()
plt.show()
Let's plot the converted and missed shots,
sns.set_style("white")
sns.set_color_codes()
plt.figure(figsize=(10,10))
plt.scatter(shot_df[shot_df.SHOT_MADE_FLAG == 1].LOC_X, shot_df[shot_df.SHOT_MADE_FLAG == 1].LOC_Y, color='green',label='shots converted',s=20,marker='o',alpha=1.0/2.0)
plt.scatter(shot_df[shot_df.SHOT_MADE_FLAG == 0].LOC_X, shot_df[shot_df.SHOT_MADE_FLAG == 0].LOC_Y, color='red',label='shots missed',s=20,marker='o',alpha=0.5)
plt.legend()
plt.xlim(-300,300)
plt.ylim(-100,500)
plt.title('Lebron James shot chart 2014-2015 \n Converted-missed shots', y = 1.02, fontsize=20)
plt.grid()
plt.show()
Let's see in what position Lebron took more shots by using histograms.
First we need to create a DataFrame
with the LOC_X and LOC_Y information,
h1=pd.DataFrame(shot_df, columns=['LOC_X', 'LOC_Y'])
Now we use this DataFrame
to plot the histograms for LOC_X and LOC_Y,
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
plt.figure(figsize=(10,10))
plt.subplot(2, 1, 1)
sns.distplot(h1.LOC_X,kde=False);
plt.subplot(2, 1, 2)
sns.distplot(h1.LOC_Y,kde=False);
As we can see from these histogram, Lebron took more shots around the rim.
We are doing exploratory data analysis or EDA. At this point, take your time and try to do different plots, there are many types and combinations.
Let's plot all shots and the histograms in one single plot, for this we are going to use seaborn,
# create our jointplot
sns.set(color_codes=True)
joint_shot_chart = sns.jointplot(shot_df1.LOC_X, shot_df1.LOC_Y, stat_func=None,
kind='scatter',space=0.2, alpha=1,
size=8, edgecolor='w', color='b').set_axis_labels("x location", "y location")
# A joint plot has 3 Axes, the first one called ax_joint
# is the one we want to adjust
joint_shot_chart.fig.set_size_inches(10,10)
ax = joint_shot_chart.ax_joint
# Adjust the axis limits and orientation of the plot in order
ax.set_xlim(-250,250)
#ax.set_ylim(422.5, -47.5)
ax.set_ylim(-47.5, 422.5)
# Get rid of axis labels and tick marks
ax.set_xlabel('')
ax.set_ylabel('')
ax.tick_params(labelbottom='off', labelleft='off')
# Add a title
ax.set_title('Lebron James shot chart 2014-2015',
y=1.25, fontsize=14)
plt.show()
As we can see here, Lebron is very active close to the rim area.
What about his effectivity?, this is let to you as an exercise.
Instead of using a scatter plot, we can use hexbins. In seaborn, you can proceed as follows,
# create our jointplot
sns.set_style("white")
cmap=plt.cm.gist_heat_r
joint_shot_chart = sns.jointplot(shot_df.LOC_X, shot_df.LOC_Y, stat_func=None,
kind='hex',gridsize=40,space=0, color=cmap(0.2), cmap=cmap,vmin=0, vmax=50)
#joint_shot_chart = sns.jointplot(shot_df.LOC_X, shot_df.LOC_Y, stat_func=None,
# kind='hex',space=0, color=cmap(0.2), cmap=cmap)
joint_shot_chart.fig.set_size_inches(12,11)
# A joint plot has 3 Axes, the first one called ax_joint
# is the one we want to adjust
ax = joint_shot_chart.ax_joint
# Adjust the axis limits and orientation of the plot in order
ax.set_xlim(-250,250)
#ax.set_ylim(422.5, -47.5)
ax.set_ylim(-47.5, 422.5)
# Get rid of axis labels and tick marks
ax.set_xlabel('')
ax.set_ylabel('')
ax.tick_params(labelbottom='off', labelleft='off')
# Add a title
ax.set_title('Lebron James shot chart 2014-2015',
y=1.25, fontsize=14)
# Add James Harden's image to the top right
#img = OffsetImage(image, zoom=0.6)
#img.set_offset((625,621))
#ax.add_artist(img)
plt.show()
Let's use boxplot
to represent the field attempts, this kind of plots are very useful to identify outliers or anomalies in your observations.
sns.boxplot(shot_df.LOC_Y, orient="v")
plt.title('Lebron James - Boxplot of all field attempts for LOC_Y', y = 1.1, fontsize=20)
<matplotlib.text.Text at 0x10a36ee10>
In this plot we can see that there are a few outliers or irregular observations.
We can also plot the boxplot with the scatter point superimpose, as follows,
#sns.stripplot(data=shot_df.LOC_Y, jitter=True, color="white", edgecolor="gray")
sns.stripplot(data=shot_df.LOC_Y, jitter=0.3, color="white", edgecolor="gray")
sns.boxplot(shot_df.LOC_Y, orient="v")
plt.title('Lebron James - Boxplot of all field attempts for LOC_Y', y = 1.1, fontsize=20)
<matplotlib.text.Text at 0x10a56f5d0>
Here we see better the outliers, which corresponds to the shots taken from far away, and presumably close to the end of a period. This is left as an exercise to the reader.
We can do the same for LOC_X,
sns.stripplot(data=shot_df.LOC_X, jitter=0.3, color="white", edgecolor="gray")
sns.boxplot(shot_df.LOC_X, orient="v")
plt.title('Lebron James - Boxplot of all field attempts for LOC_X', y = 1.1, fontsize=20)
<matplotlib.text.Text at 0x10a25ced0>
At the point let's do some pretty plotting. We are going to add a background to the scatter plot and we are going to plot only the 3 point shots attempted.
To plot the backgorund image we are going to use scipy [4].
from scipy.misc import imread
plt.figure(figsize=(10,10))
#draw_court(outer_lines=True)
#plt.xlim(-300,300)
#plt.ylim(-100,500)
datafile = 'chart1.png'
#datafile = 'chart2.jpg'
#datafile = 'bg_court.jpg'
img = imread(datafile)
#plt.imshow(img, zorder=0, extent=[-260, 260, -60, 400])
plt.imshow(img, zorder=0, extent=[-260, 260, -400, 60])
plt.scatter(c2.LOC_X,-1*c2.LOC_Y,zorder=1,color='blue',label='3 point shots attempted',s=20,marker='o',alpha=0.6)
plt.title('Lebron James shot chart 2014-2015 \n 3 point shots attempted', y = 1.02, fontsize=20)
#plt.ylim(400, -60)
plt.show()
Much better no, here we see the importance of visualizing the data and using the right visual encoders.
Now we get a better idea of those outliers we identified with the boxplots.
Let's do the same plot of the 2 point shots attempted,
from scipy.misc import imread
plt.figure(figsize=(10,10))
#draw_court(outer_lines=True)
#plt.xlim(-300,300)
#plt.ylim(-100,500)
datafile = 'chart1.png'
#datafile = 'chart2.jpg'
#datafile = 'bg_court.jpg'
img = imread(datafile)
#plt.imshow(img, zorder=0, extent=[-260, 260, -60, 400])
plt.imshow(img, zorder=0, extent=[-260, 260, -400, 60])
plt.scatter(c1.LOC_X,-1*c1.LOC_Y,zorder=1,color='green',label='2 point shots attempted',s=20,marker='o',alpha=0.6)
plt.title('Lebron James shot chart 2014-2015 \n 2 point shots attempted', y = 1.02, fontsize=20)
#plt.ylim(400, -60)
plt.show()
Let's do a pretty plot of the 2 point shots attempted, grouped by missed and converted shots.
Pay attention of how we are manipulating the DataFrame
at plot time.
from scipy.misc import imread
plt.figure(figsize=(10,10))
#draw_court(outer_lines=True)
#plt.xlim(-300,300)
#plt.ylim(-100,500)
datafile = 'chart1.png'
#datafile = 'chart2.jpg'
#datafile = 'bg_court.jpg'
img = imread(datafile)
#plt.imshow(img, zorder=0, extent=[-260, 260, -60, 400])
plt.imshow(img, zorder=0, extent=[-260, 260, -400, 60])
plt.scatter(shot_df1.LOC_X[(shot_df1.SHOT_MADE_FLAG == 1) & (shot_df.SHOT_TYPE == '2PT Field Goal')] ,-1*shot_df1.LOC_Y[(shot_df1.SHOT_MADE_FLAG == 1) & (shot_df.SHOT_TYPE == '2PT Field Goal')],
zorder=1,color='green',label='2 point shots converted',s=20,marker='o',alpha=0.6)
plt.scatter(shot_df1.LOC_X[(shot_df1.SHOT_MADE_FLAG == 0) & (shot_df.SHOT_TYPE == '2PT Field Goal')] ,-1*shot_df1.LOC_Y[(shot_df1.SHOT_MADE_FLAG == 0) & (shot_df.SHOT_TYPE == '2PT Field Goal')],
zorder=1,color='red',label='2 point shots converted',s=10,marker='o',alpha=0.6)
plt.title('Lebron James shot chart 2014-2015 \n 2 point shots attempted', y = 1.02, fontsize=20)
plt.legend(loc='lower left')
#plt.ylim(400, -60)
plt.show()
In the previous plots, we used matplotlib and seaborn.
We can also plot using pandas as follows,
shot_df.plot(kind='scatter',x='LOC_X', y='LOC_Y',c='SHOT_MADE_FLAG',colorbar=False);
#plt.xlim(-300,300)
#plt.ylim(-100,500)
_ = plt.xlim(-300,300)
_ = plt.ylim(-100,500)
#The _ is used to hold the result of the last executed statement in an interactive interpreter session.
#If you do not use, it will print out the value of the last statement
To plot hexbins using pandas we proceed as follows,
shot_df.plot(kind='hexbin', x='LOC_X', y='LOC_Y', gridsize=50, colorbar=False, vmin=0, vmax=10)
plt.xlim(-300,300)
_ = plt.ylim(-100,500)
#The _ is used to hold the result of the last executed statement in an interactive interpreter session.
#If you do not use, it will print out the value of the last statement
We can also plot using the DataFrameGroupBy
.
Let's create a group as follows,
#gb3 = shot_df.groupby('SHOT_ZONE_AREA')
gb3 = shot_df.groupby('SHOT_TYPE')
#gb3x = shot_df.groupby('SHOT_TYPE').LOC_X
#gb3y = shot_df.groupby('SHOT_TYPE').LOC_Y
which has a size of,
gb3.size()
SHOT_TYPE 2PT Field Goal 940 3PT Field Goal 339 dtype: int64
and a type,
type(gb3)
pandas.core.groupby.DataFrameGroupBy
Now we can collect the coordinates LOC_X and LOC_Y of the group we just created,
x=gb3.LOC_X.get_group('2PT Field Goal')
y=gb3.LOC_Y.get_group('2PT Field Goal')
xx=gb3.LOC_X.get_group('3PT Field Goal')
yy=gb3.LOC_Y.get_group('3PT Field Goal')
#x=gb3.LOC_X.head(2)
#y=gb3.LOC_Y.head(2)
Now we can do the scatter plot as follows,
plt.scatter( x, y,s=20,marker='o',alpha=1,color='green')
plt.scatter( xx, yy,s=20,marker='o',alpha=1,color='red')
<matplotlib.collections.PathCollection at 0x10b4a55d0>
We can also directly plot the DataFrameGroupBy
as follows,
gb3.plot(kind='scatter', x='LOC_X', y='LOC_Y',xlim=[-300,300], ylim=[-50,500]);
Notice that it automatically creates two plots, one corresponding to the two point shots and the other one to the three point shots.
With a little bit manipulation, you can put both plots in a single figure and use colors to represent the shot type, this is left to the reader as an exercise.
We can also plot histograms of the DataFrameGroupBy
,
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("white")
sns.set_color_codes()
gb3.PERIOD.plot(kind='hist')
plt.legend()
<matplotlib.legend.Legend at 0x10877f410>
Let's create a new DataFrameGroupBy
as follows,
gb4 = shot_df.groupby('SHOT_ZONE_AREA')
gb4.size()
SHOT_ZONE_AREA Back Court(BC) 3 Center(C) 670 Left Side Center(LC) 177 Left Side(L) 181 Right Side Center(RC) 149 Right Side(R) 99 dtype: int64
And to plot the histograms,
#gb4.LOC_X.hist()
#It does not print out data
_ = gb4.LOC_X.hist()
#gb4.LOC_Y.hist()
#It does not print out data
_ = gb4.LOC_Y.hist()
Let's do the scatter plots for the DataFrameGroupBy
,
gb4.plot(kind='scatter', x='LOC_X', y='LOC_Y',xlim=[-300,300], ylim=[-50,500]);
We can choose to plot only the Center shots of the DataFrameGroupBy
as follows,
from scipy.misc import imread
plt.figure(figsize=(10,10))
datafile = 'chart1.png'
img = imread(datafile)
#gb4.get_group('Center(C)').head(2)
#gb4.get_group('Center(C)').LOC_X
plt.imshow(img, zorder=0, extent=[-260, 260, -400, 60])
plt.scatter(gb4.get_group('Center(C)').LOC_X, -1*gb4.get_group('Center(C)').LOC_Y,s=20,marker='o',alpha=1,color='green')
#plt.xlim(-300,300)
#plt.ylim(-100,500)
<matplotlib.collections.PathCollection at 0x10d92a6d0>
#import sys
#print('Python version:', sys.version_info)
#import IPython
#print('IPython version:', IPython.__version__)
#print('Requests version', requests.__version__)
#print('Pandas version:', pd.__version__)
#print('json version:', json.__version__)
#import matplotlib
#print('matplotlib version:', matplotlib.__version__)
#print('seaborn version:', sns.__version__)
#import scipy
#print('scipy version:', scipy.__version__)