# Recall that pandas is frequently imported with the alias pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('../data/gapminder_gni.csv')
df.head()
country | year | pop | continent | lifeExp | gdpPercap | gniPercap | |
---|---|---|---|---|---|---|---|
0 | Afghanistan | 1962 | 10267083.0 | Asia | 31.997 | 853.100710 | NaN |
1 | Afghanistan | 1967 | 11537966.0 | Asia | 34.020 | 836.197138 | NaN |
2 | Afghanistan | 1972 | 13079460.0 | Asia | 36.088 | 739.981106 | NaN |
3 | Afghanistan | 1977 | 14880372.0 | Asia | 38.438 | 786.113360 | NaN |
4 | Afghanistan | 1982 | 12881816.0 | Asia | 39.854 | 978.011439 | NaN |
The following block of code contains three errors that are preventing it from running properly. What are the errors? How would you fix them?
# Add a colon, add indentation, match loop variable
for number in [2, 3, 5]:
print(number)
2 3 5
Below are a few examples showing the different types of quantities you might aggregate using a for loop. These loops are partially filled out. Finish them and test that they work!
total
.total = 0
words = ['red', 'green', 'blue']
for w in words:
total = total + len(w)
print(total)
12
lengths
.lengths = []
words = ['red', 'green', 'blue']
for w in words:
lengths.append(len(w))
print(lengths)
[3, 5, 4]
result
.words = ['red', 'green', 'blue']
result = ''
for w in words:
result += w
print(result)
redgreenblue
'RGB'
for the input ['red', 'green', 'blue']
. For this one, write the entire loop yourself!words = ['red', 'green', 'blue']
up = ''
# YOUR CODE HERE
for each in words:
up += each[0].upper()
print(up)
RGB
# ANOTHER OPTION, USING LIST COMPREHENSION
''.join([each[0].upper() for each in words])
Say our year
column contains wrong information and we need to add one year to each value. Use a vectorized operation to get this done.
# YOUR CODE HERE
df['year'] + 1
0 1963 1 1968 2 1973 3 1978 4 1983 ... 1315 1988 1316 1993 1317 1998 1318 2003 1319 2008 Name: year, Length: 1320, dtype: int64
What is the data type of the output of describe()
?
type(df.describe())
Let's say you have a list of countries you want to compare life expectancy for using a single lineplot. We will create a function for this.
We have set up the list and function for you. Your goal is to:
country_list
.country_data
to the subset of the country you are looping over in the list.label=
parameter of plt.plot()
, fill in the loop variable name.Run the cell when you're done: if you've succeeded, you should see a single line plot with life expectancy for all of the countries in country_list
.
💡 Tip: If you have time left, try to add labels and title to the plot using plt.xlabel()
, plt.ylabel()
, and plt.title()
. See this resource for more information!
# YOUR CODE HERE
country_list = ['Afghanistan', 'Canada', 'Zimbabwe']
def plot_life_expectancy(df, countries):
for country in countries:
country_data = df[df['country'] == country]
plt.plot(country_data['year'], country_data['lifeExp'], label=country)
plt.xlabel('Years')
plt.ylabel('Life Expectancy')
plt.title('Life Expectancy by Country')
plt.legend()
plt.show()
plot_life_expectancy(df, country_list)