Analyzing Star Wars Data

The team at FiveThirtyEight surveyed a question with star wars fans does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch? using the online tool SurveyMonkey. They received 835 total responses. So we will be analyzing these data based on the responses from the star wars fans and will answer the above question

In [1]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
star_wars = pd.read_csv("star_wars.csv", encoding="ISO-8859-1")

We need to specify an encoding because the data set has some characters that aren't in Python's default utf-8 encoding.

Overview Data

In [2]:
#Lets analyze the data
star_wars.head(10)
Out[2]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? Which of the following Star Wars films have you seen? Please select all that apply. Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
0 NaN Response Response Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi Star Wars: Episode I The Phantom Menace ... Yoda Response Response Response Response Response Response Response Response Response
1 3.292880e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 No NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 Yes No Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith NaN NaN NaN 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
6 3.292719e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 1 ... Very favorably Han Yes No Yes Male 18-29 $25,000 - $49,999 Bachelor degree Middle Atlantic
7 3.292685e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 6 ... Very favorably Han Yes No No Male 18-29 NaN High school degree East North Central
8 3.292664e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 4 ... Very favorably Han No NaN Yes Male 18-29 NaN High school degree South Atlantic
9 3.292654e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Han No NaN No Male 18-29 $0 - $24,999 Some college or Associate degree South Atlantic

10 rows × 38 columns

In [3]:
#Lets analyze the data types
star_wars.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187 entries, 0 to 1186
Data columns (total 38 columns):
RespondentID                                                                                                                                     1186 non-null float64
Have you seen any of the 6 films in the Star Wars franchise?                                                                                     1187 non-null object
Do you consider yourself to be a fan of the Star Wars film franchise?                                                                            837 non-null object
Which of the following Star Wars films have you seen? Please select all that apply.                                                              674 non-null object
Unnamed: 4                                                                                                                                       572 non-null object
Unnamed: 5                                                                                                                                       551 non-null object
Unnamed: 6                                                                                                                                       608 non-null object
Unnamed: 7                                                                                                                                       759 non-null object
Unnamed: 8                                                                                                                                       739 non-null object
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.    836 non-null object
Unnamed: 10                                                                                                                                      837 non-null object
Unnamed: 11                                                                                                                                      836 non-null object
Unnamed: 12                                                                                                                                      837 non-null object
Unnamed: 13                                                                                                                                      837 non-null object
Unnamed: 14                                                                                                                                      837 non-null object
Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.                                   830 non-null object
Unnamed: 16                                                                                                                                      832 non-null object
Unnamed: 17                                                                                                                                      832 non-null object
Unnamed: 18                                                                                                                                      824 non-null object
Unnamed: 19                                                                                                                                      826 non-null object
Unnamed: 20                                                                                                                                      815 non-null object
Unnamed: 21                                                                                                                                      827 non-null object
Unnamed: 22                                                                                                                                      821 non-null object
Unnamed: 23                                                                                                                                      813 non-null object
Unnamed: 24                                                                                                                                      828 non-null object
Unnamed: 25                                                                                                                                      831 non-null object
Unnamed: 26                                                                                                                                      822 non-null object
Unnamed: 27                                                                                                                                      815 non-null object
Unnamed: 28                                                                                                                                      827 non-null object
Which character shot first?                                                                                                                      829 non-null object
Are you familiar with the Expanded Universe?                                                                                                     829 non-null object
Do you consider yourself to be a fan of the Expanded Universe?ξ                                                                               214 non-null object
Do you consider yourself to be a fan of the Star Trek franchise?                                                                                 1069 non-null object
Gender                                                                                                                                           1047 non-null object
Age                                                                                                                                              1047 non-null object
Household Income                                                                                                                                 859 non-null object
Education                                                                                                                                        1037 non-null object
Location (Census Region)                                                                                                                         1044 non-null object
dtypes: float64(1), object(37)
memory usage: 352.5+ KB

Conclusion

The data has several columns, including:

  • RespondentID - An anonymized ID for the respondent (person taking the survey)
  • Gender - The respondent's gender
  • Age - The respondent's age
  • Household Income - The respondent's income
  • Education - The respondent's education level
  • Location (Census Region) - The respondent's location
  • Have you seen any of the 6 films in the Star Wars franchise? - Has a Yes or No response
  • Do you consider yourself to be a fan of the Star Wars film franchise? - Has a Yes or No response

We observed that RespondentID is a unique ID but it contains many blank rows. So we will remove those rows with invalid RespondentID

In [4]:
#Lets analyze columns
star_wars.columns
Out[4]:
Index(['RespondentID',
       'Have you seen any of the 6 films in the Star Wars franchise?',
       'Do you consider yourself to be a fan of the Star Wars film franchise?',
       'Which of the following Star Wars films have you seen? Please select all that apply.',
       'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14',
       'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
       'Unnamed: 28', 'Which character shot first?',
       'Are you familiar with the Expanded Universe?',
       'Do you consider yourself to be a fan of the Expanded Universe?ξ',
       'Do you consider yourself to be a fan of the Star Trek franchise?',
       'Gender', 'Age', 'Household Income', 'Education',
       'Location (Census Region)'],
      dtype='object')
In [5]:
# Selecting rows with geniuine respondent id
star_wars=star_wars[star_wars['RespondentID'].notna()]
In [6]:
# Verifying Data
star_wars.head(10)
Out[6]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? Which of the following Star Wars films have you seen? Please select all that apply. Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 No NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 Yes No Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith NaN NaN NaN 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
6 3.292719e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 1 ... Very favorably Han Yes No Yes Male 18-29 $25,000 - $49,999 Bachelor degree Middle Atlantic
7 3.292685e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 6 ... Very favorably Han Yes No No Male 18-29 NaN High school degree East North Central
8 3.292664e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 4 ... Very favorably Han No NaN Yes Male 18-29 NaN High school degree South Atlantic
9 3.292654e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Han No NaN No Male 18-29 $0 - $24,999 Some college or Associate degree South Atlantic
10 3.292640e+09 Yes No NaN Star Wars: Episode II Attack of the Clones NaN NaN NaN NaN 1 ... Very favorably I don't understand this question No NaN No Male 18-29 $25,000 - $49,999 Some college or Associate degree Pacific

10 rows × 38 columns

Conclusion

Now star_wars dataframe conatin rows where RespondentID is not NaN.

Cleaning and Mapping Yes/No Columns

Columns Have you seen any of the 6 films in the Star Wars franchise? and Do you consider yourself to be a fan of the Star Wars film franchise? have values Yes/No. But they can also be NaN where a respondent chooses not to answer a question.

In [7]:
# Analyzing user response for the view count
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'][:20]
Out[7]:
1     Yes
2      No
3     Yes
4     Yes
5     Yes
6     Yes
7     Yes
8     Yes
9     Yes
10    Yes
11    Yes
12     No
13    Yes
14    Yes
15    Yes
16    Yes
17    Yes
18    Yes
19    Yes
20    Yes
Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: object
In [8]:
# Analyzing user response for fan count
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'][:20]
Out[8]:
1     Yes
2     NaN
3      No
4     Yes
5     Yes
6     Yes
7     Yes
8     Yes
9     Yes
10     No
11    NaN
12    NaN
13     No
14    Yes
15    Yes
16    Yes
17    Yes
18    Yes
19    Yes
20    Yes
Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: object

Conclusion

We can see that both colum contain values as Yes,No or NAN.

It will be easier if we convert these values in boolean as booleans are easier to work with because we can select the rows that are True or False without having to do a string comparison.

In [9]:
# Conrting Yes/No values as Boolean values
yes_no = {
    "Yes": True,
    "No": False
}

star_wars=star_wars.copy()
star_wars['Have you seen any of the 6 films in the Star Wars franchise?']=star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].map(yes_no)

star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?']=star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].map(yes_no)
In [10]:
# Verifying the conversion
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'][:20]
Out[10]:
1      True
2     False
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
19     True
20     True
Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: bool
In [11]:
# Verifying the conversion
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'][:20]
Out[11]:
1      True
2       NaN
3     False
4      True
5      True
6      True
7      True
8      True
9      True
10    False
11      NaN
12      NaN
13    False
14     True
15     True
16     True
17     True
18     True
19     True
20     True
Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: object

Modifying Movie Column Names

The next six columns represent a single checkbox question. The respondent checked off a series of boxes in response to the question, Which of the following Star Wars films have you seen? Please select all that apply.

The columns for this question are:

  • Which of the following Star Wars films have you seen? Please select all that apply. - Whether or not the respondent saw Star Wars: Episode I The Phantom Menace.
  • Unnamed: 4 - Whether or not the respondent saw Star Wars: Episode II Attack of the Clones.
  • Unnamed: 5 - Whether or not the respondent saw Star Wars: Episode III Revenge of the Sith.
  • Unnamed: 6 - Whether or not the respondent saw Star Wars: Episode IV A New Hope.
  • Unnamed: 7 - Whether or not the respondent saw Star Wars: Episode V The Empire Strikes Back.
  • Unnamed: 8 - Whether or not the respondent saw Star Wars: Episode VI Return of the Jedi.
In [12]:
# Analyzig movie column values
star_wars['Which of the following Star Wars films have you seen? Please select all that apply.']
Out[12]:
1       Star Wars: Episode I  The Phantom Menace
2                                            NaN
3       Star Wars: Episode I  The Phantom Menace
4       Star Wars: Episode I  The Phantom Menace
5       Star Wars: Episode I  The Phantom Menace
6       Star Wars: Episode I  The Phantom Menace
7       Star Wars: Episode I  The Phantom Menace
8       Star Wars: Episode I  The Phantom Menace
9       Star Wars: Episode I  The Phantom Menace
10                                           NaN
11                                           NaN
12                                           NaN
13      Star Wars: Episode I  The Phantom Menace
14      Star Wars: Episode I  The Phantom Menace
15      Star Wars: Episode I  The Phantom Menace
16      Star Wars: Episode I  The Phantom Menace
17                                           NaN
18      Star Wars: Episode I  The Phantom Menace
19      Star Wars: Episode I  The Phantom Menace
20      Star Wars: Episode I  The Phantom Menace
21      Star Wars: Episode I  The Phantom Menace
22      Star Wars: Episode I  The Phantom Menace
23      Star Wars: Episode I  The Phantom Menace
24      Star Wars: Episode I  The Phantom Menace
25      Star Wars: Episode I  The Phantom Menace
26                                           NaN
27      Star Wars: Episode I  The Phantom Menace
28      Star Wars: Episode I  The Phantom Menace
29      Star Wars: Episode I  The Phantom Menace
30      Star Wars: Episode I  The Phantom Menace
                          ...                   
1157    Star Wars: Episode I  The Phantom Menace
1158                                         NaN
1159                                         NaN
1160                                         NaN
1161    Star Wars: Episode I  The Phantom Menace
1162    Star Wars: Episode I  The Phantom Menace
1163    Star Wars: Episode I  The Phantom Menace
1164                                         NaN
1165    Star Wars: Episode I  The Phantom Menace
1166    Star Wars: Episode I  The Phantom Menace
1167    Star Wars: Episode I  The Phantom Menace
1168    Star Wars: Episode I  The Phantom Menace
1169                                         NaN
1170    Star Wars: Episode I  The Phantom Menace
1171                                         NaN
1172    Star Wars: Episode I  The Phantom Menace
1173                                         NaN
1174    Star Wars: Episode I  The Phantom Menace
1175    Star Wars: Episode I  The Phantom Menace
1176    Star Wars: Episode I  The Phantom Menace
1177    Star Wars: Episode I  The Phantom Menace
1178    Star Wars: Episode I  The Phantom Menace
1179                                         NaN
1180                                         NaN
1181    Star Wars: Episode I  The Phantom Menace
1182    Star Wars: Episode I  The Phantom Menace
1183    Star Wars: Episode I  The Phantom Menace
1184                                         NaN
1185    Star Wars: Episode I  The Phantom Menace
1186    Star Wars: Episode I  The Phantom Menace
Name: Which of the following Star Wars films have you seen? Please select all that apply., Length: 1186, dtype: object

Now we will modify the values in the column such that if the movie is seen we will mark them as True and if not then NAN.

In [13]:
# Mapping movie column as boolean values
movie_mapping = {
    "Star Wars: Episode I  The Phantom Menace": True,
    np.nan: False,
    "Star Wars: Episode II  Attack of the Clones": True,
    "Star Wars: Episode III  Revenge of the Sith": True,
    "Star Wars: Episode IV  A New Hope": True,
    "Star Wars: Episode V The Empire Strikes Back": True,
    "Star Wars: Episode VI Return of the Jedi": True
}

for col in star_wars.columns[3:9]:
    star_wars[col] = star_wars[col].map(movie_mapping)
In [14]:
# Verifying conversion
star_wars.loc[3:9].head(10)
Out[14]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? Which of the following Star Wars films have you seen? Please select all that apply. Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
3 3.292765e+09 True False True True True False False False 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True True True True True True True 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True True True True True True True 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
6 3.292719e+09 True True True True True True True True 1 ... Very favorably Han Yes No Yes Male 18-29 $25,000 - $49,999 Bachelor degree Middle Atlantic
7 3.292685e+09 True True True True True True True True 6 ... Very favorably Han Yes No No Male 18-29 NaN High school degree East North Central
8 3.292664e+09 True True True True True True True True 4 ... Very favorably Han No NaN Yes Male 18-29 NaN High school degree South Atlantic
9 3.292654e+09 True True True True True True True True 5 ... Somewhat favorably Han No NaN No Male 18-29 $0 - $24,999 Some college or Associate degree South Atlantic

7 rows × 38 columns

In [15]:
# Renaming the column to seen_1,seen_2... and so on
star_wars = star_wars.rename(columns={
        "Which of the following Star Wars films have you seen? Please select all that apply.": "seen_1",
        "Unnamed: 4": "seen_2",
        "Unnamed: 5": "seen_3",
        "Unnamed: 6": "seen_4",
        "Unnamed: 7": "seen_5",
        "Unnamed: 8": "seen_6"
        })
In [16]:
# Verifying data
star_wars.head(10)
Out[16]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 True True True True True True True True 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 False NaN False False False False False False NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 True False True True True False False False 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True True True True True True True 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True True True True True True True 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
6 3.292719e+09 True True True True True True True True 1 ... Very favorably Han Yes No Yes Male 18-29 $25,000 - $49,999 Bachelor degree Middle Atlantic
7 3.292685e+09 True True True True True True True True 6 ... Very favorably Han Yes No No Male 18-29 NaN High school degree East North Central
8 3.292664e+09 True True True True True True True True 4 ... Very favorably Han No NaN Yes Male 18-29 NaN High school degree South Atlantic
9 3.292654e+09 True True True True True True True True 5 ... Somewhat favorably Han No NaN No Male 18-29 $0 - $24,999 Some college or Associate degree South Atlantic
10 3.292640e+09 True False False True False False False False 1 ... Very favorably I don't understand this question No NaN No Male 18-29 $25,000 - $49,999 Some college or Associate degree Pacific

10 rows × 38 columns

Modifying Ranking Columns

We have given meaningful names to column. Now lets analyze next 6 columns.

The next six columns ask the respondent to rank the Star Wars movies in order of least favorite to most favorite. 1 means the film was the most favorite, and 6 means it was the least favorite. Each of the following columns can contain the value 1, 2, 3, 4, 5, 6, or NaN:

  • Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. - How much the respondent liked Star Wars: Episode I The Phantom Menace
  • Unnamed: 10 - How much the respondent liked Star Wars: Episode II Attack of the Clones
  • Unnamed: 11 - How much the respondent liked Star Wars: Episode III Revenge of the Sith
  • Unnamed: 12 - How much the respondent liked Star Wars: Episode IV A New Hope
  • Unnamed: 13 - How much the respondent liked Star Wars: Episode V The Empire Strikes Back
  • Unnamed: 14 - How much the respondent liked Star Wars: Episode VI Return of the Jedi

As we can observe that there mostly integer values in these columns so we will convet them as float data type

In [17]:
# Converting data type from integer to float
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)

Now we will rename the column names to give more meaningful aspect of data.

In [18]:
# Renaming columns
star_wars = star_wars.rename(columns={
        "Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.": "ranking_1",
        "Unnamed: 10": "ranking_2",
        "Unnamed: 11": "ranking_3",
        "Unnamed: 12": "ranking_4",
        "Unnamed: 13": "ranking_5",
        "Unnamed: 14": "ranking_6"
        })

#Verifying Data
star_wars.head()
Out[18]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 ranking_1 ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 True True True True True True True True 3.0 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 False NaN False False False False False False NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 True False True True True False False False 1.0 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True True True True True True True 5.0 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True True True True True True True 5.0 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central

5 rows × 38 columns

Now lets analyze the ranking columns and plot them

Analyzing Rankings

In [19]:
# Find mean values of ranking columns
mean=star_wars[star_wars.columns[9:15]].mean()
In [20]:
# Verifying average ranking mean values
mean
Out[20]:
ranking_1    3.732934
ranking_2    4.087321
ranking_3    4.341317
ranking_4    3.272727
ranking_5    2.513158
ranking_6    3.047847
dtype: float64
In [21]:
# Plot average ranking mean values
sns.set_style('white')
x_tile= ["Empire I","Empire II",'Empire III','Empire IV','Empire V','Empire VI']
fig,ax= plt.subplots(figsize=(5,5))
x= np.arange(len(mean.index))
y= mean.values
ax.bar(x,y)
ax.set_xticks([0.5,1.5,2.5,3.5,4.5,5.5])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("Average Ranking of all 6 Star war movies ")
ax.set_ylabel('Ranking')
ax.set_xlabel('Movies')
ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()

Conclusion

From above plot it looks like the "original" movies are rated much more highly than the newer ones.

Analyzing View counts

In [22]:
# Calculating value count for seen movies
total_seen=star_wars.iloc[:,3:9].sum()
In [23]:
# Verifying Data
total_seen
Out[23]:
seen_1    673
seen_2    571
seen_3    550
seen_4    607
seen_5    758
seen_6    738
dtype: int64
In [24]:
# Plot view count of the movies
sns.set_style('white')
fig,ax= plt.subplots(figsize=(5,5))
x= np.arange(len(total_seen.index))
y= total_seen.values
ax.bar(x,y)
ax.set_xticks([0.5,1.5,2.5,3.5,4.5,5.5])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("View count of the movies")
ax.set_ylabel('Count')
ax.set_xlabel('Movies')
ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()

Conclusion

From above plot we can observe that:

  • New movies are seen by most of the people. This differs from the rating as new movies are rated low than the old ones

Rankings based on Gender

In [25]:
# Filtering data based on gender
males = star_wars[star_wars["Gender"] == "Male"]
females = star_wars[star_wars["Gender"] == "Female"]
In [26]:
# Calculating mean values of ranking genderwise
males_mean=males[males.columns[9:15]].mean()
females_mean=females[females.columns[9:15]].mean()
In [27]:
# Plot average ranking of movies for Male
import seaborn as sns
sns.set_style('white')
# plt.style.use('seaborn-paper')
fig,ax= plt.subplots(figsize=(5,5))
x= np.arange(len(males_mean.index))
y= males_mean.values
sns.barplot(x,y)
# ax.set_xticks([0.5,1.5,2.5,3.5,4.5,5.5])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("Average Ranking of all 6 Star war movies by Male")
ax.set_ylabel('Ranking')
ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()
/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:1428: FutureWarning:

remove_na is deprecated and is a private function. Do not use.

In [28]:
# Plot average ranking of movies for Females
import seaborn as sns
print(plt.style.available)
sns.set_style('white')
fig,ax= plt.subplots(figsize=(5,5))
x= np.arange(len(females_mean.index))
y= females_mean.values
sns.barplot(x,y)
# ax.set_xticks([0.5,1.5,2.5,3.5,4.5,5.5])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("Average Ranking of all 6 Star war movies by Female")
ax.set_ylabel('Ranking')
ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()
['seaborn-ticks', 'bmh', 'fivethirtyeight', 'seaborn-white', 'seaborn-muted', 'dark_background', 'seaborn-poster', 'seaborn-whitegrid', 'seaborn-talk', 'seaborn-notebook', 'ggplot', 'seaborn-deep', 'classic', 'seaborn-dark-palette', 'seaborn-darkgrid', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-paper', 'seaborn-pastel']
/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:1428: FutureWarning:

remove_na is deprecated and is a private function. Do not use.

In [29]:
# Plot average ranking of movies for both genders
sns.set_style('white')
plt.figure(figsize=(10, 6))
width = 0.25
plt.bar(np.arange(1,len(females_mean.index)+1)+width,females_mean.values,width,label="Female",color="red")
plt.bar(np.arange(1,len(males_mean.index)+1)-width,males_mean.values,width,label="Male",color="blue")

plt.xticks(np.arange(1,len(females_mean.index)+1), x_tile,fontsize=10)
for i,d in enumerate(males_mean.values):
    plt.text(x=i+1-width, y=d+0.10,s='{:.2f}'.format(d),fontdict=dict(fontsize=8),bbox=dict(facecolor='white', alpha=0.5))

for i,d in enumerate(females_mean.values):
    plt.text(x=i+1+width, y=d+0.10,s='{:.2f}'.format(d),fontdict=dict(fontsize=8),bbox=dict(facecolor='gray', alpha=0.5))    
    
plt.legend(title='Gender')
plt.ylabel('Average Ranking',fontsize=12)
plt.ylim(0,5)
plt.title('Ranking of Star war movies Gender wise',fontsize=18)
plt.show()

Conclusion

From above plot we can observe that:

  • Empire I,Empire II and Empire III (i.e old) movies are highly ranked by Males than Females
  • Empire IV,Empire V and Empire VI (i.e new) movies are highly ranked by Females than Males
  • Old movies are rated more than the New ones by both the genders

View Count based on Gender

In [30]:
# Calculating movie view count for males 
males_seen=males.iloc[:,3:9].sum()
males_seen
Out[30]:
seen_1    361
seen_2    323
seen_3    317
seen_4    342
seen_5    392
seen_6    387
dtype: int64
In [31]:
# Calculating movie view count for females 
females_seen=females.iloc[:,3:9].sum()
females_seen
Out[31]:
seen_1    298
seen_2    237
seen_3    222
seen_4    255
seen_5    353
seen_6    338
dtype: int64
In [32]:
# Plot view count of the movies for Males
sns.set_style('white')
fig,ax= plt.subplots(figsize=(5,5))
x= np.arange(len(males_seen.index))
y= males_seen.values
width=0.5
ax.bar(x+width,y)
ax.set_xticks([0.75,1.75,2.75,3.75,4.75,5.75])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("Movie seen by Males",fontsize=18)
ax.set_ylabel('No of Males',fontsize=12)
ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()
In [33]:
# Plot view count of the movies for Females
fig,ax= plt.subplots(figsize=(5,5))
sns.set_style('white')
x_tile= ["Empire I","Empire II",'Empire III','Empire IV','Empire V','Empire VI']
x= np.arange(len(females_seen.index))
y= females_seen.values
width=0.5
ax.bar(x+width,y)
ax.set_xticks([0.75,1.75,2.75,3.75,4.75,5.75])
ax.set_xticklabels(x_tile,rotation=90)
ax.set_title("Movie seen by Females",fontsize=18)
ax.set_ylabel('No of Females',fontsize=12)
# ax.tick_params(bottom="off",top="off",left="off",right="off")
for sp in ax.spines:
    ax.spines[sp].set_visible(False)
plt.show()
In [34]:
# Plot view count of the movies for Females
sns.set_style('white')
plt.figure(figsize=(10, 6))
width = 0.25
x_tile= ["Empire I","Empire II",'Empire III','Empire IV','Empire V','Empire VI']
plt.bar(np.arange(1,len(females_seen.index)+1)+width,females_seen.values,width,label="Female",color="red")
plt.bar(np.arange(1,len(males_seen.index)+1)-width,males_seen.values,width,label="Male",color="blue")

plt.xticks(np.arange(1,len(females_seen.index)+1), x_tile,fontsize=10)
for i,d in enumerate(males_seen.values):
    plt.text(x=i+1-width, y=d+10,s=d,fontdict=dict(fontsize=8),bbox=dict(facecolor='white', alpha=0.5))

for i,d in enumerate(females_seen.values):
    plt.text(x=i+1+width, y=d+10,s=d,fontdict=dict(fontsize=8),bbox=dict(facecolor='gray', alpha=0.3))    

plt.legend(title='Gender')
plt.ylabel('Average View Count',fontsize=12)
plt.ylim(0,450)
plt.title('View Count of Star war movies Gender wise',fontsize=24)
plt.show()

Conclusion

We can observe from above plot that:

  • Males have seen all the movies more than females
  • Among females most viewed movies are the new ones.
  • Empire V movie is the most watched movie by both the Genders

Ranking based on the Education

In [35]:
# Analyzing data based on education
star_wars['Education'].value_counts(dropna=False)
Out[35]:
Some college or Associate degree    328
Bachelor degree                     321
Graduate degree                     275
NaN                                 150
High school degree                  105
Less than high school degree          7
Name: Education, dtype: int64
In [36]:
# Calculating pivot Data for ranking grouped by education
pivot_data= star_wars.pivot_table(values=['ranking_1','ranking_2','ranking_3','ranking_4','ranking_5','ranking_6'],index='Education',dropna=True,aggfunc=np.mean)
pivot_data.reset_index(inplace=True)

#Verifying pivot data
pivot_data
Out[36]:
Education ranking_1 ranking_2 ranking_3 ranking_4 ranking_5 ranking_6
0 Bachelor degree 3.828244 4.290076 4.521073 3.114504 2.309160 2.931298
1 Graduate degree 3.822222 4.225664 4.500000 3.199115 2.323009 2.920354
2 High school degree 3.802817 3.746479 4.126761 3.211268 2.873239 3.239437
3 Less than high school degree 5.000000 5.333333 3.666667 2.666667 1.000000 3.333333
4 Some college or Associate degree 3.551181 3.885827 4.102362 3.503937 2.783465 3.173228
In [37]:
# Plot pivot data for ranking based on education
pivot_data.plot.bar(width=0.8)
deg=['Bachelor','Graduate','High school','< High school','> High School']
plt.xticks(np.arange(0,5),deg,fontsize=8,rotation=0)
plt.legend(fontsize=7)
plt.ylabel('Rating')
plt.xlabel('Degree')
plt.title('Ranking based on Education')
plt.show()

Conclusion

From above plot we can observe that:

  • 1st movie is rated high from all education aspect
  • 2nd movie is rated high by students who are below High School
  • 5th movie is rated low by students who are below High School

View Count based on the Location

In [38]:
#Unique values
star_wars['Location (Census Region)'].value_counts(dropna=False)
Out[38]:
East North Central    181
Pacific               175
South Atlantic        170
NaN                   143
Middle Atlantic       122
West South Central    110
West North Central     93
Mountain               79
New England            75
East South Central     38
Name: Location (Census Region), dtype: int64
In [39]:
#Setting a pivot table:
pivot_location= star_wars.pivot_table(values=['seen_1','seen_2','seen_3','seen_4','seen_5','seen_6'],index='Location (Census Region)',dropna=True,aggfunc=np.sum,)
pivot_location.reset_index(inplace=True)
pivot_location
Out[39]:
Location (Census Region) seen_1 seen_2 seen_3 seen_4 seen_5 seen_6
0 East North Central 102.0 89.0 89.0 95.0 128.0 121.0
1 East South Central 24.0 21.0 21.0 27.0 31.0 29.0
2 Middle Atlantic 79.0 69.0 70.0 76.0 83.0 85.0
3 Mountain 57.0 47.0 46.0 54.0 61.0 59.0
4 New England 50.0 44.0 43.0 48.0 55.0 54.0
5 Pacific 120.0 99.0 92.0 105.0 123.0 124.0
6 South Atlantic 104.0 82.0 79.0 93.0 125.0 120.0
7 West North Central 61.0 51.0 49.0 49.0 67.0 63.0
8 West South Central 62.0 58.0 50.0 49.0 70.0 69.0
In [40]:
#Plotting pivot
pivot_location.plot.bar(figsize=(15,6),width=0.8)
deg=['East North Central','East South Central','Middle Atlantic','Mountain','New England','Pacific','South Atlantic','West North Central','West South Central']
plt.xticks(np.arange(0,9),deg,fontsize=8,rotation=0)
plt.legend(fontsize=12)
plt.ylabel('View Count',fontsize=15)
plt.xlabel('Location (Census Region)',fontsize=15)
plt.title('View Count based on location',fontsize=18)
plt.show()

Conclusion

From above plot we can observe that:

  • People of East North Central,Pacific and South Atlantic have overall seen more movies than people of other regions
  • People of East South Central have seen less movies
  • 5th movie is the highest viewed movie in all the regions

Analyzing the most loved and hated character

In [41]:
#Now renaming columns 15-28:
name_mapping = {'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.':'Han Solo',
                'Unnamed: 16':'Luke Skywalker','Unnamed: 17':'Princess Leia','Unnamed: 18':'Anakin','Unnamed: 19':'Obi wan Kenobi',
                'Unnamed: 20':'Palpatine','Unnamed: 21':'Darth Vader','Unnamed: 22':'Lando','Unnamed: 23':'Boba Fett','Unnamed: 24':'C-3PO',
                'Unnamed: 25':'R2 D2','Unnamed: 26':'Jar Jar Binks','Unnamed: 27':'Padme','Unnamed: 28':'Yoda'}
star_wars=star_wars.rename(columns=(name_mapping)).copy()
In [42]:
#Lets check for NAN values on character columns:
star_wars[star_wars.columns[15:29]].isna().sum()
Out[42]:
Han Solo          357
Luke Skywalker    355
Princess Leia     355
Anakin            363
Obi wan Kenobi    361
Palpatine         372
Darth Vader       360
Lando             366
Boba Fett         374
C-3PO             359
R2 D2             356
Jar Jar Binks     365
Padme             372
Yoda              360
dtype: int64

Conclusion

As we can observe that the missing values lies in about the same range i.e 355-374 so it could be because of some similar dataset missing . So we can remove them as it will affect to all column as the same.

In [43]:
#Lets drop NAN values on character columns:
character_star_wars=star_wars[star_wars.columns[15:29]].dropna(axis=0)
In [44]:
#Verify the data
character_star_wars.head(10)
Out[44]:
Han Solo Luke Skywalker Princess Leia Anakin Obi wan Kenobi Palpatine Darth Vader Lando Boba Fett C-3PO R2 D2 Jar Jar Binks Padme Yoda
1 Very favorably Very favorably Very favorably Very favorably Very favorably Very favorably Very favorably Unfamiliar (N/A) Unfamiliar (N/A) Very favorably Very favorably Very favorably Very favorably Very favorably
3 Somewhat favorably Somewhat favorably Somewhat favorably Somewhat favorably Somewhat favorably Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A) Unfamiliar (N/A)
4 Very favorably Very favorably Very favorably Very favorably Very favorably Somewhat favorably Very favorably Somewhat favorably Somewhat unfavorably Very favorably Very favorably Very favorably Very favorably Very favorably
5 Very favorably Somewhat favorably Somewhat favorably Somewhat unfavorably Very favorably Very unfavorably Somewhat favorably Neither favorably nor unfavorably (neutral) Very favorably Somewhat favorably Somewhat favorably Very unfavorably Somewhat favorably Somewhat favorably
6 Very favorably Very favorably Very favorably Very favorably Very favorably Neither favorably nor unfavorably (neutral) Very favorably Neither favorably nor unfavorably (neutral) Somewhat favorably Somewhat favorably Somewhat favorably Somewhat favorably Neither favorably nor unfavorably (neutral) Very favorably
7 Very favorably Very favorably Somewhat favorably Somewhat favorably Very favorably Very favorably Very favorably Very favorably Very favorably Somewhat favorably Very favorably Somewhat unfavorably Somewhat favorably Very favorably
8 Very favorably Somewhat favorably Very favorably Neither favorably nor unfavorably (neutral) Very favorably Very unfavorably Somewhat unfavorably Neither favorably nor unfavorably (neutral) Somewhat favorably Somewhat favorably Somewhat favorably Very unfavorably Somewhat unfavorably Very favorably
9 Very favorably Somewhat unfavorably Somewhat favorably Somewhat favorably Somewhat favorably Very favorably Very favorably Very favorably Very favorably Neither favorably nor unfavorably (neutral) Somewhat favorably Very unfavorably Somewhat unfavorably Somewhat favorably
10 Neither favorably nor unfavorably (neutral) Very favorably Very favorably Very favorably Very favorably Somewhat unfavorably Very favorably Somewhat unfavorably Somewhat unfavorably Very favorably Very favorably Very favorably Somewhat unfavorably Very favorably
13 Somewhat favorably Very favorably Somewhat favorably Somewhat favorably Somewhat favorably Somewhat favorably Very favorably Neither favorably nor unfavorably (neutral) Somewhat favorably Very favorably Very favorably Very favorably Somewhat favorably Very favorably

Mapping character column values

We can convert the ratings in integer format to perform our calculation easily. So we will map values as:

  • Very favorably:6
  • Somewhat favorably:5
  • Neither favorably nor unfavorably (neutral):4
  • Somewhat unfavorably:3
  • Unfamiliar (N/A):2
  • Very unfavorably:1
In [45]:
#Mapping character column data
mapping = {'Very favorably':6,'Somewhat favorably':5,'Neither favorably nor unfavorably (neutral)':4,'Somewhat unfavorably':3,'Unfamiliar (N/A)':2,'Very unfavorably':1}

for c in character_star_wars:
  character_star_wars[c]=character_star_wars[c].map(mapping)
In [49]:
# Verifying data
character_star_wars.head(10)
Out[49]:
Han Solo Luke Skywalker Princess Leia Anakin Obi wan Kenobi Palpatine Darth Vader Lando Boba Fett C-3PO R2 D2 Jar Jar Binks Padme Yoda
1 6 6 6 6 6 6 6 2 2 6 6 6 6 6
3 5 5 5 5 5 2 2 2 2 2 2 2 2 2
4 6 6 6 6 6 5 6 5 3 6 6 6 6 6
5 6 5 5 3 6 1 5 4 6 5 5 1 5 5
6 6 6 6 6 6 4 6 4 5 5 5 5 4 6
7 6 6 5 5 6 6 6 6 6 5 6 3 5 6
8 6 5 6 4 6 1 3 4 5 5 5 1 3 6
9 6 3 5 5 5 6 6 6 6 4 5 1 3 5
10 4 6 6 6 6 3 6 3 3 6 6 6 3 6
13 5 6 5 5 5 5 6 4 5 6 6 6 5 6
In [47]:
# Find average rating of each character
character_mean=character_star_wars.mean().sort_values(ascending=False)
In [48]:
# Plot the graph
sns.set_style('white')
plt.figure(figsize=(10, 6))
character_mean.plot.barh()

plt.box(False) #remove box
plt.yticks(fontsize=14)

plt.xlabel('Ratings',fontsize=12)
plt.ylabel('Characters',fontsize=12)
plt.title('Average Rating of Character',fontsize=24)
plt.show()

Conclusion

  • As obvious Han Solo is the most liked character
  • Yoda is the second most liked character
  • Jar Jar Binks is the least liked character

Overall Conclusions:

  • 'Empire Strikes back' is the favourite movie in Star Wars Saga
  • New Movies are way more liked by fans than older ones
  • Star Wars has a good distribution of both Male and Female fans
  • Han solo is the most liked character and Jar Jar binks is the least liked character