Final Project Report: Pokémon Stats Analysis

Introduction

Pokémon is a global icon to children and adults everywhere. When I was in primary shcool, I once played a game called Pokémon Emerald, which was my first contact with things related to Pokémon. And nowadays, some main characters in Pokémon such as Pikachu, Zenigame(a.k.a Squirtle) and Koduck(a.k.a Psyduck) are very popular on the Internet in China as funny memes.

So when I found a dataset about Pokémon by coincidence, I think it’s a quite interesting theme for my final project to do some Pokémon stats analysis and visualization using python skills I learned from this semester. That's why I chose this dataset.

In the process of completing the report, I found that some of the charts I learned in our python course were not good enough to make a nice visualization of this data, so I used another methods I learned online - Seaborn as a supplement to Matplotlib.pylpot and pandas.plotting so that to plot new kinds of diagrams like Swarmplot, PairGrid, Violinplot and so on to make my report more beautiful and complete or simplize the process of visualization.

Pokemon.csv includes 721 Pokémon, including their id number, name, first and second type, generation, whether they are LEGENDARY or not and basic stats: HP, Attack, Defense, Special Attack, Special Defense, Speed and the sum of all these basic stats.

They are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the pokemon games(like Pokémon Emerald that I've mentioned above, but NOT Pokémon cards or Pokémon Go). Use this data, we can explore what characteristics can attribute to the victory in a combat between Pokémon.

I downloaded this dataset from datafountain.cn. Then I found that there is a same Pokémon dataset provided by Alberto Barradas on kaggle with more detailed information, so I guess that kaggle is the initial origin of the data.

Anyway, as the direct sources of my dataset and other relevant information, datafountain.cn and kaggle do help me a lot on attaining and comprehending this dataset.

The data as described by Myles O'Neill is:

Attribute Defination
# ID for each Pokémon
Name Name for each Pokémon
Type 1 Each Pokémon has a type, this determines weakness/resistance to attacks
Type 2 Some Pokémon are dual type and have 2
Total Sum of all stats that come after this, a general guide to how strong a Pokémon is
HP Hit points, or health, defines how much damage a Pokémon can withstand before fainting
Attack The base modifier for normal attacks(eg. scratch, punch)
Defense The base damage resistance against normal attacks
Sp. Atk Special attack, the base modifier for special attacks(eg. fire blast, bubble beam)
Sp. Def Special defense, the base damage resistance against special attacks
Speed Determines which Pokémon attacks first round

1 Preparation

1.1 Notebook Prep

Firstly, import the packages that will be used below and do some pre-setting at the beginning.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid", palette="pastel")
%matplotlib inline

1.2 Data Import and Preprocessing

Now, import the dataset and see what's I'm dealing with.

In [3]:
# read the csv file and save it into a variable
pkm_raw = pd.read_csv('Pokemon.csv')

# see first 5 lines of the dataframe
pkm_raw.head()
Out[3]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
In [4]:
# see some basic information about the dataframe
pkm_raw.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB

We can see that there are 800 Pokémon in total. The dtypes of all the columns are matched with our demand for further analysis, but there are some missing data in column 'Type 2', since not all Pokémon have a second type. Afterwards I'll mainly focus on the discrepancy between different first types, different generations and Legendary or not. So I'll just keep column 'Type 2' as it is and won't use it to do any further analysis next.

Then use the .rename and .set_index methods to make the structure of the dataframe more intuitive.

In [5]:
# rename column "#" as "ID" and set "Type 1" and "Name" as index
pkm = pkm_raw.set_index(['Type 1', 'Name']).rename(columns={"#": "ID"})

# see first 5 lines of the dataframe
pkm.head()
Out[5]:
ID Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
Type 1 Name
Grass Bulbasaur 1 Poison 318 45 49 49 65 65 45 1 False
Ivysaur 2 Poison 405 60 62 63 80 80 60 1 False
Venusaur 3 Poison 525 80 82 83 100 100 80 1 False
VenusaurMega Venusaur 3 Poison 625 80 100 123 122 120 80 1 False
Fire Charmander 4 NaN 309 39 52 43 60 50 65 1 False

2 Data Analysis and Visualization

2.1 Basic Statistics

To begin with, let’s take a look at some basic statistics.

In [6]:
# show some basic statistics of the data
pkm.describe()
Out[6]:
ID Total HP Attack Defense Sp. Atk Sp. Def Speed Generation
count 800.000000 800.00000 800.000000 800.000000 800.000000 800.000000 800.000000 800.000000 800.00000
mean 362.813750 435.10250 69.258750 79.001250 73.842500 72.820000 71.902500 68.277500 3.32375
std 208.343798 119.96304 25.534669 32.457366 31.183501 32.722294 27.828916 29.060474 1.66129
min 1.000000 180.00000 1.000000 5.000000 5.000000 10.000000 20.000000 5.000000 1.00000
25% 184.750000 330.00000 50.000000 55.000000 50.000000 49.750000 50.000000 45.000000 2.00000
50% 364.500000 450.00000 65.000000 75.000000 70.000000 65.000000 70.000000 65.000000 3.00000
75% 539.250000 515.00000 80.000000 100.000000 90.000000 95.000000 90.000000 90.000000 5.00000
max 721.000000 780.00000 255.000000 190.000000 230.000000 194.000000 230.000000 180.000000 6.00000

The generated dataframe shows us the count number of non-NA/null observations, mean of the values, standard deviation of the observations and so on. Later I'll use these statistics to do some comparison among different Pokémon.

In [7]:
# boxplot

pkm[['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']].plot.box(figsize=(8,8));

The box-plot vividly illustrates the condition of data dispersion. The distributions of HP, Attack, Defense, Sp.Def and Speed are almost symmetric, while the distribution of Sp.Atk shows positive skewness.

(Here I didn't use "Total" because it's the sum of other numeric stats so it's much larger, which may make the box figure of other stats less legible.)

2.2 Top 10 Pokémon with Highest/Lowest Total Value

Total is the sum of HP, Attack, Defense, Sp. Atk, Sp.Def and Speed. Therefore, it is a measure of the comprehensive ability of Pokémon to some extend.

In [8]:
# Top 10 Pokémon with highest Total value
total_top10_high = pkm_raw.sort_values(by = ['Total'], ascending = False).head(10)

total_top10_high
Out[8]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
426 384 RayquazaMega Rayquaza Dragon Flying 780 105 180 100 180 100 115 3 True
164 150 MewtwoMega Mewtwo Y Psychic NaN 780 106 150 70 194 120 140 1 True
163 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
422 382 KyogrePrimal Kyogre Water NaN 770 100 150 90 180 160 90 3 True
424 383 GroudonPrimal Groudon Ground Fire 770 100 180 160 150 90 90 3 True
552 493 Arceus Normal NaN 720 120 120 120 120 120 120 4 True
712 646 KyuremWhite Kyurem Dragon Ice 700 125 120 90 170 100 95 5 True
711 646 KyuremBlack Kyurem Dragon Ice 700 125 170 100 120 90 95 5 True
409 373 SalamenceMega Salamence Dragon Flying 700 95 145 130 120 90 120 3 False
413 376 MetagrossMega Metagross Steel Psychic 700 80 145 150 105 110 110 3 False
In [9]:
fig = plt.figure(figsize=(12,12))  

# show the ratio of the first type
ax1 = fig.add_subplot(2,2,1)  
total_top10_high['Type 1'].value_counts().plot.pie(colors=['#FFE4C4', '#FF8C00', '#DEB887', '#FFD700', '#F0E68C', '#FFFACD'])

# show the ratio of the second type
ax2 = fig.add_subplot(2,2,2)  
total_top10_high['Type 2'].value_counts().plot.pie(colors=['#FFB6C1', '#FFF5EE', '#FFE4E1', '#FA8072', '#FFF0F5'])

# show the ratio of generation
ax3 = fig.add_subplot(2,2,3)  
total_top10_high['Generation'].value_counts().plot.pie(colors=['#6495ED', '#4169E1', '#87CEFA', '#E0FFFF'])

# show the ratio of legendary Pokémon
ax4 = fig.add_subplot(2,2,4)  
total_top10_high['Legendary'].value_counts().plot.pie(colors=['#AFEEEE', '#48D1CC', '#F0FFF0', '#90EE90']);
In [10]:
# Top 10 Pokémon with lowest Total value
total_top10_low = pkm_raw.sort_values(by = ['Total']).head(10)

total_top10_low
Out[10]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
206 191 Sunkern Grass NaN 180 30 30 30 30 30 30 2 False
322 298 Azurill Normal Fairy 190 50 20 40 20 40 20 3 False
446 401 Kricketot Bug NaN 194 37 25 41 25 41 25 4 False
288 265 Wurmple Bug NaN 195 45 45 35 20 30 20 3 False
16 13 Weedle Bug Poison 195 40 35 30 20 20 50 1 False
13 10 Caterpie Bug NaN 195 45 30 35 20 20 45 1 False
303 280 Ralts Psychic Fairy 198 28 25 25 45 35 40 3 False
732 664 Scatterbug Bug NaN 200 38 35 40 27 25 35 6 False
139 129 Magikarp Water NaN 200 20 10 55 15 20 80 1 False
381 349 Feebas Water NaN 200 20 15 20 10 55 80 3 False
In [81]:
fig = plt.figure(figsize=(12,12))  

# show the ratio of the first type
ax1 = fig.add_subplot(2,2,1)  
total_top10_low['Type 1'].value_counts().plot.pie(colors=['#FFE4C4', '#FF8C00', '#DEB887', '#FFD700', '#F0E68C', '#FFFACD'])

# show the ratio of the second type
ax2 = fig.add_subplot(2,2,2)  
total_top10_low['Type 2'].value_counts().plot.pie(colors=['#FFB6C1', '#FFF5EE', '#FFE4E1', '#FA8072', '#FFF0F5'])

# show the ratio of generation
ax3 = fig.add_subplot(2,2,3)  
total_top10_low['Generation'].value_counts().plot.pie(colors=['#6495ED', '#4169E1', '#87CEFA', '#E0FFFF', '#B0E0E6'])

# show the ratio of legendary Pokémon
ax4 = fig.add_subplot(2,2,4)  
total_top10_low['Legendary'].value_counts().plot.pie(colors=['#AFEEEE', '#48D1CC', '#F0FFF0', '#90EE90']);

As the four pie charts clearly indicates to us, among the top 10 Pokemon with highest Total value, the most common are Pokemon with Dragon type, followed by Psychic type. Half of them belongs to Generation 3, and most of them are Legendary . As to the top 10 Pokémon with lowest Total value, the most common are Pokemon with Bug type, followed by Water type. The majority of them belong to Generation 1 or 3, and all of them are not Legendary.

My guess is that Pokémon which have the first type of Dragon and are Legendary tend to be the most powerful when fighting with others. On the contrary, Pokémon which have the first type of Bug and are not Legendary are more likely to be weak in combating.

Obviously the information of top 10 Pokemon with highest/lowest Total value can only be seen just as a reference. The more relevant factors to winning a combat remains to be find in the further analysis below.

2.3 About Type 1

Let's see some information of first type.

In [12]:
# count the number of different types of Pokémon
type1_count = pkm_raw["Type 1"].value_counts()

type1_count
Out[12]:
Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Ghost        32
Ground       32
Dragon       32
Dark         31
Poison       28
Steel        27
Fighting     27
Ice          24
Fairy        17
Flying        4
Name: Type 1, dtype: int64
In [13]:
# the number of kinds of first type
len(type1_count)
Out[13]:
18
  • #### Pie Chart
In [14]:
# pie chart
type1_pie = type1_count.plot.pie(colors=['#FFE4C4', '#FF8C00', '#DEB887', '#FAEBD7', '#D2B48C', '#FFDEAD', 
                             '#FFEBCD', '#FFA500', '#F5DEB3', '#DAA520', '#FFD700', '#FFFACD', 
                             '#F0E68C', '#EEE8AA', '#FFFFE0', '#F5F5DC', '#FFA07A', '#FFE4E1'],
                     figsize=(6,6))
type1_pie;

In this part we'll focus on Type 1, so ID, Type 2, Total, Legendary and Generation are unneccessary columns. Drop them to generate a new dataframe named pkm_stats_type1.

In [15]:
pkm_stats_type1 = pkm.drop(columns=['ID', 'Type 2', 'Total', 'Legendary', 'Generation'])
pkm_stats_type1
Out[15]:
HP Attack Defense Sp. Atk Sp. Def Speed
Type 1 Name
Grass Bulbasaur 45 49 49 65 65 45
Ivysaur 60 62 63 80 80 60
Venusaur 80 82 83 100 100 80
VenusaurMega Venusaur 80 100 123 122 120 80
Fire Charmander 39 52 43 60 50 65
... ... ... ... ... ... ... ...
Rock Diancie 50 100 150 100 150 50
DiancieMega Diancie 50 160 110 160 110 110
Psychic HoopaHoopa Confined 80 110 60 150 130 70
HoopaHoopa Unbound 80 160 60 170 130 80
Fire Volcanion 80 110 120 130 90 70

800 rows × 6 columns

  • #### Violinplot

Use .melt skill, integrate HP, Attack, Defense, Sp.Atk, Sp.Def and Speed into variable.

In [16]:
# reshape the dataframe into a "long-form" data
pkm_long = pkm_raw.melt(id_vars=["Name", "#", "Type 1", "Type 2", "Generation", "Legendary", "Total"])

pkm_long
Out[16]:
Name # Type 1 Type 2 Generation Legendary Total variable value
0 Bulbasaur 1 Grass Poison 1 False 318 HP 45
1 Ivysaur 2 Grass Poison 1 False 405 HP 60
2 Venusaur 3 Grass Poison 1 False 525 HP 80
3 VenusaurMega Venusaur 3 Grass Poison 1 False 625 HP 80
4 Charmander 4 Fire NaN 1 False 309 HP 39
... ... ... ... ... ... ... ... ... ...
4795 Diancie 719 Rock Fairy 6 True 600 Speed 50
4796 DiancieMega Diancie 719 Rock Fairy 6 True 700 Speed 110
4797 HoopaHoopa Confined 720 Psychic Ghost 6 True 600 Speed 70
4798 HoopaHoopa Unbound 720 Psychic Dark 6 True 680 Speed 80
4799 Volcanion 721 Fire Water 6 True 600 Speed 70

4800 rows × 9 columns

As I've mentioned in 2.2, Total is a measure of the comprehensive ability of Pokémon to some extend. For this reason, I'll use a Violinplot to depict the inter-type difference of Total so that to compare their comprehensive ability.

In [17]:
# Violinplot

plt.figure(figsize=(12, 8))
type1_violinplot = sns.violinplot(x="Type 1", y="Total", data=pkm_long);
plt.show()

This violinplot unfolded that Pokémon of Dragon type seems to be the most outstanding - their ability values are concentrated in higher level. And Pokémon of Flying type ia also better than others.

  • #### Swarmplot

Then we can use the "long" dataframe to draw a Swarmplot, depicting the discrepancy of various basic stats distritutions between different types.

In [18]:
# swarmplot

plt.figure(figsize=(15, 10))
type1_swarmplot = sns.swarmplot(x="variable", y="value", hue="Type 1",
              data=pkm_long, dodge=True);
type1_swarmplot.legend(loc='best', bbox_to_anchor=(1, 0.7))
plt.show()

To compute the accurate mean value of these basic stats, use .pivot_table method to generate a "wide" dataframe, then calculate the mean value of "variable", and use .unstack skill to reshape the table at the end.

In [19]:
pkm_wide1 = pkm_long.pivot_table(
    index="Generation",
    columns=["Type 1", "Name", "variable"],
    values="value"
)
type1_mean = pkm_wide1.stack(level="Name").mean().unstack()

type1_mean
Out[19]:
variable Attack Defense HP Sp. Atk Sp. Def Speed
Type 1
Bug 70.971014 70.724638 56.884058 53.869565 64.797101 61.681159
Dark 88.387097 70.225806 66.806452 74.645161 69.516129 76.161290
Dragon 112.125000 86.375000 83.312500 96.843750 88.843750 83.031250
Electric 69.090909 66.295455 59.795455 90.022727 73.704545 84.500000
Fairy 61.529412 65.705882 74.117647 78.529412 84.705882 48.588235
Fighting 96.777778 65.925926 69.851852 53.111111 64.703704 66.074074
Fire 84.769231 67.769231 69.903846 88.980769 72.211538 74.442308
Flying 78.750000 66.250000 70.750000 94.250000 72.500000 102.500000
Ghost 73.781250 81.187500 64.437500 79.343750 76.468750 64.343750
Grass 73.214286 70.800000 67.271429 77.500000 70.428571 61.928571
Ground 95.750000 84.843750 73.781250 56.468750 62.750000 63.906250
Ice 72.750000 71.416667 72.000000 77.541667 76.291667 63.458333
Normal 73.469388 59.846939 77.275510 55.816327 63.724490 71.551020
Poison 74.678571 68.821429 67.250000 60.428571 64.392857 63.571429
Psychic 71.456140 67.684211 70.631579 98.403509 86.280702 81.491228
Rock 92.863636 100.795455 65.363636 63.340909 75.477273 55.909091
Steel 92.703704 126.370370 65.222222 67.518519 80.629630 55.259259
Water 74.151786 72.946429 72.062500 74.812500 70.517857 65.964286
In [20]:
# pre-judge whether the inter-type difference is significant
type1_mean.describe()
Out[20]:
variable Attack Defense HP Sp. Atk Sp. Def Speed
count 18.000000 18.000000 18.000000 18.000000 18.000000 18.000000
mean 80.956622 75.776927 69.262080 74.523722 73.219136 69.131196
std 12.888090 15.956643 6.123653 15.142395 7.964463 12.853200
min 61.529412 59.846939 56.884058 53.111111 62.750000 48.588235
25% 72.866071 66.642644 65.724340 61.156656 65.976858 62.311012
50% 74.415179 70.475222 69.877849 76.156250 72.355769 65.154018
75% 91.624552 79.127232 72.046875 86.571514 76.424479 75.731545
max 112.125000 126.370370 83.312500 98.403509 88.843750 102.500000

We can see that for each numeric indicator, the standard error and the difference between maximum and minimum value is not small. This preliminarily shows that there are significant differences between Pokémon of different first types.

  • #### Bar Plot

After getting a dataframe of the accurate values of the mean values of all kinds of basic numeric stats, we can use bar-plot to illustrate them vividly.

In [21]:
type1_barplot1 = type1_mean.plot.bar(figsize=[20,8])
type1_barplot1;
In [22]:
# use the transpose of the table to draw another bar-plot
type1_barplot2 = type1_mean.T.plot.bar(figsize=[20,8])
type1_barplot2;

These two different kinds of bar-plot we generated above provide two dimentions of analysis:

  • For each type, what's their strength and what's their weakness?
  • For each numeric attribute, which type performs best and which one is the worst?

Due to the large scale of this dataset, the outcome that bar-plots gave to us is not so legible. We still need to do use other tools to show more accurate consequence.

At the beginning, consider about the first question:

For each type, what's their strength and what's their weakness?

To answer the first question, use .sort_value skill to sort these statistics.

In [23]:
type1_mean.T.columns 

# to get a list of Type 1:'type1' and copy to the next cell
Out[23]:
Index(['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire',
       'Flying', 'Ghost', 'Grass', 'Ground', 'Ice', 'Normal', 'Poison',
       'Psychic', 'Rock', 'Steel', 'Water'],
      dtype='object', name='Type 1')
In [24]:
type1 = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire',
       'Flying', 'Ghost', 'Grass', 'Ground', 'Ice', 'Normal', 'Poison',
       'Psychic', 'Rock', 'Steel', 'Water']

# generate the ranks of all the basic stats
for types in type1:
    print('The rank of basic stats of', types, 'is:')
    print(type1_mean.T.unstack()[types].sort_values(ascending=False))
The rank of basic stats of Bug is:
variable
Attack     70.971014
Defense    70.724638
Sp. Def    64.797101
Speed      61.681159
HP         56.884058
Sp. Atk    53.869565
dtype: float64
The rank of basic stats of Dark is:
variable
Attack     88.387097
Speed      76.161290
Sp. Atk    74.645161
Defense    70.225806
Sp. Def    69.516129
HP         66.806452
dtype: float64
The rank of basic stats of Dragon is:
variable
Attack     112.12500
Sp. Atk     96.84375
Sp. Def     88.84375
Defense     86.37500
HP          83.31250
Speed       83.03125
dtype: float64
The rank of basic stats of Electric is:
variable
Sp. Atk    90.022727
Speed      84.500000
Sp. Def    73.704545
Attack     69.090909
Defense    66.295455
HP         59.795455
dtype: float64
The rank of basic stats of Fairy is:
variable
Sp. Def    84.705882
Sp. Atk    78.529412
HP         74.117647
Defense    65.705882
Attack     61.529412
Speed      48.588235
dtype: float64
The rank of basic stats of Fighting is:
variable
Attack     96.777778
HP         69.851852
Speed      66.074074
Defense    65.925926
Sp. Def    64.703704
Sp. Atk    53.111111
dtype: float64
The rank of basic stats of Fire is:
variable
Sp. Atk    88.980769
Attack     84.769231
Speed      74.442308
Sp. Def    72.211538
HP         69.903846
Defense    67.769231
dtype: float64
The rank of basic stats of Flying is:
variable
Speed      102.50
Sp. Atk     94.25
Attack      78.75
Sp. Def     72.50
HP          70.75
Defense     66.25
dtype: float64
The rank of basic stats of Ghost is:
variable
Defense    81.18750
Sp. Atk    79.34375
Sp. Def    76.46875
Attack     73.78125
HP         64.43750
Speed      64.34375
dtype: float64
The rank of basic stats of Grass is:
variable
Sp. Atk    77.500000
Attack     73.214286
Defense    70.800000
Sp. Def    70.428571
HP         67.271429
Speed      61.928571
dtype: float64
The rank of basic stats of Ground is:
variable
Attack     95.75000
Defense    84.84375
HP         73.78125
Speed      63.90625
Sp. Def    62.75000
Sp. Atk    56.46875
dtype: float64
The rank of basic stats of Ice is:
variable
Sp. Atk    77.541667
Sp. Def    76.291667
Attack     72.750000
HP         72.000000
Defense    71.416667
Speed      63.458333
dtype: float64
The rank of basic stats of Normal is:
variable
HP         77.275510
Attack     73.469388
Speed      71.551020
Sp. Def    63.724490
Defense    59.846939
Sp. Atk    55.816327
dtype: float64
The rank of basic stats of Poison is:
variable
Attack     74.678571
Defense    68.821429
HP         67.250000
Sp. Def    64.392857
Speed      63.571429
Sp. Atk    60.428571
dtype: float64
The rank of basic stats of Psychic is:
variable
Sp. Atk    98.403509
Sp. Def    86.280702
Speed      81.491228
Attack     71.456140
HP         70.631579
Defense    67.684211
dtype: float64
The rank of basic stats of Rock is:
variable
Defense    100.795455
Attack      92.863636
Sp. Def     75.477273
HP          65.363636
Sp. Atk     63.340909
Speed       55.909091
dtype: float64
The rank of basic stats of Steel is:
variable
Defense    126.370370
Attack      92.703704
Sp. Def     80.629630
Sp. Atk     67.518519
HP          65.222222
Speed       55.259259
dtype: float64
The rank of basic stats of Water is:
variable
Sp. Atk    74.812500
Attack     74.151786
Defense    72.946429
HP         72.062500
Sp. Def    70.517857
Speed      65.964286
dtype: float64

The result is matched with the bar-plot 1 above. But at the same time, it's too long to read.

What else can we do to make it more clear and legible?

  • #### Radar Map

Recall the mean value of the whole dataset we calculated before:

Attribute Mean Value
Total 435.10250
HP 69.258750
Attack 79.001250
Defense 73.842500
Sp. Atk 72.820000
Sp. Def 71.902500
Speed 68.277500

Use Radar map to show the ability of Pokémon with different first types and make some comparison with the mean value of the whole dataset.

In [27]:
labels = np.array(['Attack', 'Defense', 'HP', 'Sp. Atk', 'Sp. Def', 'Speed'])
dataLenth = 6 # the length of data

data0    = [69.258750, 79.001250, 73.842500, 72.820000, 71.902500, 68.277500] # average of the whole data
datatype = list(type1_mean.index)

for i in range(18):
    datatype[i] = list(type1_mean.reset_index('Type 1').T.drop(index='Type 1')[i])
    datatype[i] = np.concatenate((datatype[i], [datatype[i][0]]))

angles = np.linspace(0, 2*np.pi, dataLenth, endpoint=False)
angles = np.concatenate((angles, [angles[0]]))
data0  = np.concatenate((data0, [data0[0]]))
In [78]:
fig = plt.figure(figsize=(30,65))

for i in range(18):
    ax = fig.add_subplot(6, 3, i+1, polar=True)
    ax.plot(angles, data0, 'o-', linewidth=3, label='Average') 
    ax.plot(angles, datatype[i], 'o-', linewidth=3, label=list(type1_mean.index)[i])
    ax.set_thetagrids(angles * 180/np.pi, labels) 
    ax.set_ylim(0,130)  
    ax.set_title(list(type1_mean.index)[i], fontsize=20)
    ax.legend(fontsize=20, bbox_to_anchor=(1.2, 1));