While waiting for Star Wars: The Force Awakens to come out, the team at FiveThirtyEight became interested in answering some questions about Star Wars fans. In particular, they wondered: does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?
The team needed to collect data addressing this question. To do this, they surveyed Star Wars fans using the online tool SurveyMonkey. They received 835 total responses, which we'll be cleaning and exploring.
import numpy as np
import pandas as pd
star_wars = pd.read_csv('star_wars.csv', encoding="ISO-8859-1")
star_wars.head(10)
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
6 | 3.292719e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 1 | ... | Very favorably | Han | Yes | No | Yes | Male | 18-29 | $25,000 - $49,999 | Bachelor degree | Middle Atlantic |
7 | 3.292685e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 6 | ... | Very favorably | Han | Yes | No | No | Male | 18-29 | NaN | High school degree | East North Central |
8 | 3.292664e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | Han | No | NaN | Yes | Male | 18-29 | NaN | High school degree | South Atlantic |
9 | 3.292654e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Han | No | NaN | No | Male | 18-29 | $0 - $24,999 | Some college or Associate degree | South Atlantic |
10 rows × 38 columns
We can notice some strange values. The RespondentID
column is supposed to be a unique ID for each respondent, but it's blank in some rows. There are also questions in the survey where the respondent had to check one or more boxes. This type of data is difficult to represent in columnar format.
# reviewing column names
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?Âæ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
Some columns have strange names, which we will handle later on. For now, we'll remove any rows where RespondentID
is NaN
.
# removing NaN values
star_wars = star_wars[star_wars['RespondentID'].notnull()]
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
Three columns represent Yes/No
questions:
Have you seen any of the 6 films in the Star Wars franchise?
Do you consider yourself to be a fan of the Star Wars film franchise?
Do you consider yourself to be a fan of the Star Trek franchise?
They can also be NaN
where a respondent chooses not to answer a question. We can convert those values to Booleans, which makes it easier to analyze down the road because we can select the rows that are True
or False
without having to do a string comparison.
# exploring columns
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].value_counts()
Yes 936 No 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].value_counts()
Yes 552 No 284 Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64
star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'].value_counts()
No 641 Yes 427 Name: Do you consider yourself to be a fan of the Star Trek franchise?, dtype: int64
# converting 'Yes' and 'No' to Boolean values
new_values = {'Yes': True, 'No': False}
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'] = star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].map(new_values)
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] = star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].map(new_values)
star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'] = star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'].map(new_values)
# exploring the new values
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].value_counts()
True 936 False 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].value_counts()
True 552 False 284 Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64
star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'].value_counts()
False 641 True 427 Name: Do you consider yourself to be a fan of the Star Trek franchise?, dtype: int64
The next six columns represent a single checkbox question. The respondent checked off a series of boxes in response to the question Which of the following Star Wars films have you seen? Please select all that apply.
The columns for this question are:
Which of the following Star Wars films have you seen? Please select all that apply.
- Whether or not the respondent saw Star Wars: Episode I The Phantom Menace
.Unnamed: 4
- Whether or not the respondent saw Star Wars: Episode II Attack of the Clones
.Unnamed: 5
- Whether or not the respondent saw Star Wars: Episode III Revenge of the Sith
.Unnamed: 6
- Whether or not the respondent saw Star Wars: Episode IV A New Hope
.Unnamed: 7
- Whether or not the respondent saw Star Wars: Episode V The Empire Strikes Back
.Unnamed: 8
- Whether or not the respondent saw Star Wars: Episode VI Return of the Jedi
.For each of these columns, if the value in a cell is the name of the movie, that means the respondent saw the movie. If the value is NaN, the respondent either didn't answer or didn't see the movie. We'll assume that they didn't see the movie.
We'll convert each of these columns to a Boolean and then rename the column something more intuitive.
new_values2 = {
'Star Wars: Episode I The Phantom Menace': True,
'Star Wars: Episode II Attack of the Clones': True,
'Star Wars: Episode III Revenge of the Sith': True,
'Star Wars: Episode IV A New Hope': True,
'Star Wars: Episode V The Empire Strikes Back': True,
'Star Wars: Episode VI Return of the Jedi': True,
np.nan: False
}
for col in star_wars.columns[3:9]:
star_wars[col] = star_wars[col].map(new_values2)
star_wars[3:9].head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 3.292763e+09 | True | True | True | True | True | True | True | True | 5 | ... | Very favorably | I don't understand this question | No | NaN | True | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | True | True | True | True | True | True | True | True | 5 | ... | Somewhat favorably | Greedo | Yes | No | False | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
6 | 3.292719e+09 | True | True | True | True | True | True | True | True | 1 | ... | Very favorably | Han | Yes | No | True | Male | 18-29 | $25,000 - $49,999 | Bachelor degree | Middle Atlantic |
7 | 3.292685e+09 | True | True | True | True | True | True | True | True | 6 | ... | Very favorably | Han | Yes | No | False | Male | 18-29 | NaN | High school degree | East North Central |
8 | 3.292664e+09 | True | True | True | True | True | True | True | True | 4 | ... | Very favorably | Han | No | NaN | True | Male | 18-29 | NaN | High school degree | South Atlantic |
5 rows × 38 columns
# renaming the columns
star_wars = star_wars.rename(columns={'Which of the following Star Wars films have you seen? Please select all that apply.': 'seen_1',
'Unnamed: 4': 'seen_2',
'Unnamed: 5': 'seen_3',
'Unnamed: 6': 'seen_4',
'Unnamed: 7': 'seen_5',
'Unnamed: 8': 'seen_6'
})
star_wars.columns[3:9]
Index(['seen_1', 'seen_2', 'seen_3', 'seen_4', 'seen_5', 'seen_6'], dtype='object')
The next six columns ask the respondent to rank the Star Wars movies in order of least favorite to most favorite. 1
means the film was the most favorite, and 6
means it was the least favorite. Each of the following columns can contain the value 1
, 2
, 3
, 4
, 5
, 6
or NaN
:
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.
- How much the respondent liked Star Wars: Episode I The Phantom Menace
.Unnamed: 10
- How much the respondent liked Star Wars: Episode II Attack of the Clones
.Unnamed: 11
- How much the respondent liked Star Wars: Episode III Revenge of the Sith
.Unnamed: 12
- How much the respondent liked Star Wars: Episode IV A New Hope
.Unnamed: 13
- How much the respondent liked Star Wars: Episode V The Empire Strikes Back
.Unnamed: 14
- How much the respondent liked Star Wars: Episode VI Return of the Jedi
.We'll convert each column to a numeric type and rename the columns.
# converting columns to numeric type
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)
# renaming the columns
star_wars = star_wars.rename(columns={'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': 'ranking_1',
'Unnamed: 10': 'ranking_2',
'Unnamed: 11': 'ranking_3',
'Unnamed: 12': 'ranking_4',
'Unnamed: 13': 'ranking_5',
'Unnamed: 14': 'ranking_6'})
star_wars.columns[9:15]
Index(['ranking_1', 'ranking_2', 'ranking_3', 'ranking_4', 'ranking_5', 'ranking_6'], dtype='object')
# calculating the mean of each ranking column
ranking_mean = star_wars[star_wars.columns[9:15]].mean()
ranking_mean
ranking_1 3.732934 ranking_2 4.087321 ranking_3 4.341317 ranking_4 3.272727 ranking_5 2.513158 ranking_6 3.047847 dtype: float64
# creating a bar chart to plot the different means
import matplotlib.pyplot as plt
%matplotlib inline
ranking_mean.plot(kind='bar', title='Ranking Star Wars Movies', colormap='ocean')
<matplotlib.axes._subplots.AxesSubplot at 0x2affcede160>
The lower the mean, the higher the respondents ranked the movie.
From this bar chart, we can deduce the following:
ranking_5
is the most favorite movie of the respondentsranking_3
is the least favorite movie of the respondentsFor the seen
columns, we'll figure out how many people have seen each movie by taking the sum of each column.
sum_seen = star_wars[star_wars.columns[3:9]].sum()
sum_seen.plot(kind='bar', title='Ranking Star Wars Movies', colormap='ocean')
<matplotlib.axes._subplots.AxesSubplot at 0x2affef8fa90>
We can deduce from this bar chart that the first and the last 2 movies have been seen the most by the respondents. This can explain why the most recent movies have a higher ranking than the older ones.
We'll now examine how certain segments of the survey population responded. There are several columns that segment our data into two groups:
Do you consider yourself to be a fan of the Star Wars film franchise?
- True
or False
Do you consider yourself to be a fan of the Star Trek franchise?
- Yes
or No
Gender
- Male
or Female
# splitting our data in two groups
fan_star_wars = star_wars[star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] == True]
no_fan_star_wars = star_wars[star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] == False]
fan_star_trek = star_wars[star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'] == 'Yes']
no_fan_star_trek = star_wars[star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'] == 'No']
males = star_wars[star_wars['Gender'] == 'Male']
females = star_wars[star_wars['Gender'] == 'Female']
# calcuting the mean ranking values for Star Wars fans and non Star Wars fans
mean_star_wars_fan = fan_star_wars[fan_star_wars.columns[9:15]].mean()
mean_star_wars_no_fan = no_fan_star_wars[no_fan_star_wars.columns[9:15]].mean()
# creating a bar chart for both groups
cols = ['ranking_1', 'ranking_2', 'ranking_3',
'ranking_4', 'ranking_5', 'ranking_6']
fan = [fan_star_wars, no_fan_star_wars]
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, mean_star_wars_fan, bar_width, color='green', label='Fan')
plt.bar(pos + bar_width, mean_star_wars_no_fan, bar_width, color='darkblue', label='No Fan')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Ranking')
plt.title('Ranking Star Wars Movies')
plt.legend(loc='upper right')
plt.show()
The bar charts show us following trends:
# calculating the total number of Star Wars fans and non Star Wars fans
star_wars_fan = star_wars[star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] == True]
star_wars_no_fan = star_wars[star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] == False]
sum_star_wars_fan = star_wars_fan[star_wars_fan.columns[3:9]].sum()
sum_star_wars_no_fan = star_wars_no_fan[star_wars_no_fan.columns[3:9]].sum()
# creating a bar chart for both groups
cols = ['seen_1', 'seen_2', 'seen_3',
'seen_4', 'seen_5', 'seen_6']
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, sum_star_wars_fan, bar_width, color='green', label='Fan')
plt.bar(pos + bar_width, sum_star_wars_no_fan, bar_width, color='darkblue', label='No Fan')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Number of Respondents')
plt.title('No of Respondants who have seen each Star Wars Movie')
plt.legend(loc='upper left')
plt.show()
From the bar chart above, we can deduce the following:
# calculating the mean ranking values for Star Trek fans and non Star Trek Fans
mean_star_trek_fan = star_trek_fan[star_trek_fan.columns[9:15]].mean()
mean_star_trek_no_fan = star_trek_no_fan[star_trek_no_fan.columns[9:15]].mean()
# creating a bar chart for both groups
cols = ['ranking_1', 'ranking_2', 'ranking_3',
'ranking_4', 'ranking_5', 'ranking_6']
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, mean_star_trek_fan, bar_width, color='green', label='Fan')
plt.bar(pos + bar_width, mean_star_trek_no_fan, bar_width, color='darkblue', label='No Fan')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Ranking')
plt.title('Ranking Star Wars Movies')
plt.legend(loc='upper right')
plt.show()
We can observe following trends:
# calculating the number of Star Trek fans and non Star Trek fans
star_trek_fan = star_wars[star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'] == True]
star_trek_no_fan = star_wars[star_wars['Do you consider yourself to be a fan of the Star Trek franchise?'] == False]
sum_star_trek_fan = star_trek_fan[star_trek_fan.columns[3:9]].sum()
sum_star_trek_no_fan = star_trek_no_fan[star_trek_no_fan.columns[3:9]].sum()
# creating a bar chart for both groups
cols = ['seen_1', 'seen_2', 'seen_3',
'seen_4', 'seen_5', 'seen_6']
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, sum_star_trek_fan, bar_width, color='green', label='Fan')
plt.bar(pos + bar_width, sum_star_trek_no_fan, bar_width, color='darkblue', label='No Fan')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Number of Respondents')
plt.title('No of Respondants who have seen each Star Wars Movie')
plt.legend(loc='upper left')
plt.show()
From the bar chart above, we can deduce the following:
# calculating the mean ranking values for males and females
mean_males = males[males.columns[9:15]].mean()
mean_females = females[females.columns[9:15]].mean()
# creating a bar chart for both groups
cols = ['ranking_1', 'ranking_2', 'ranking_3',
'ranking_4', 'ranking_5', 'ranking_6']
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, mean_males, bar_width, color='green', label='Male')
plt.bar(pos + bar_width, mean_females, bar_width, color='darkblue', label='Female')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Ranking')
plt.title('Ranking Star Wars Movies by Gender')
plt.legend(loc='upper right')
plt.show()
We can see the same trend line in ranking by male and by female respondents. Following differences appear from the bar charts:
# calculating the sum of each `seen` column
seen_male = males[males.columns[3:9]].sum()
seen_female = females[females.columns[3:9]].sum()
# creating a bar chart for both groups
cols = ['seen_1', 'seen_2', 'seen_3',
'seen_4', 'seen_5', 'seen_6']
pos = np.arange(len(cols))
bar_width = 0.35
plt.bar(pos, seen_male, bar_width, color='green', label='Male')
plt.bar(pos + bar_width, seen_female, bar_width, color='darkblue', label='Female')
plt.xticks(pos, cols, rotation=90)
plt.ylabel('Number of Respondents')
plt.title('No of Respondents who have seen the Star Wars movies')
plt.legend(loc='upper center')
plt.show()
From the bar chart above, we can deduce the following:
As a conclusion, we can say that the more respondents have seen a movie, the higher its ranking is, regardless of gender or loving Star Wars or Star Trek. The last two movies are by far the most seen and most popular ones among the six movies, followed in general by the first movie. Fans of Star Wars and Star Trek appreciate and see more movies than non fans.
Here are some potential next steps: