Python Fundamentals: Iteration and Visualization¶

SOLUTIONS¶

In [1]:

# Recall that pandas is frequently imported with the alias pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:

df = pd.read_csv('../data/gapminder_gni.csv')
df.head()

Out[2]:

	country	year	pop	continent	lifeExp	gdpPercap	gniPercap
0	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	NaN
1	Afghanistan	1967	11537966.0	Asia	34.020	836.197138	NaN
2	Afghanistan	1972	13079460.0	Asia	36.088	739.981106	NaN
3	Afghanistan	1977	14880372.0	Asia	38.438	786.113360	NaN
4	Afghanistan	1982	12881816.0	Asia	39.854	978.011439	NaN

🥊 Challenge 1: Fixing Loop Syntax¶

The following block of code contains three errors that are preventing it from running properly. What are the errors? How would you fix them?

In [4]:

# Add a colon, add indentation, match loop variable
for number in [2, 3, 5]:
    print(number)

2
3
5

🥊 Challenge 2: Aggregation Practice¶

Below are a few examples showing the different types of quantities you might aggregate using a for loop. These loops are partially filled out. Finish them and test that they work!

Find the total length of the strings in the given list. Store this quantity in a variable called total.

In [5]:

total = 0
words = ['red', 'green', 'blue']

for w in words:
     total = total + len(w)

print(total)

Find the length of each word in the list, and store these lengths in another list called lengths.

In [2]:

lengths = []
words = ['red', 'green', 'blue']

for w in words:
    lengths.append(len(w))

print(lengths)

[3, 5, 4]

Concatenate all words into a single string called result.

In [7]:

words = ['red', 'green', 'blue']
result = ''

for w in words:
    result += w

print(result)

redgreenblue

Create an acronym, as a single string, representing the list of words. Each part of the acronym should consist of the first letter of each word, capitalized. For example, your loop should output 'RGB' for the input ['red', 'green', 'blue']. For this one, write the entire loop yourself!

In [9]:

words = ['red', 'green', 'blue']

up = ''
# YOUR CODE HERE
for each in words:
    up += each[0].upper()

print(up)

RGB

In [ ]:

# ANOTHER OPTION, USING LIST COMPREHENSION

''.join([each[0].upper() for each in words])

🥊 Challenge 3: Get Vectorized¶

Say our year column contains wrong information and we need to add one year to each value. Use a vectorized operation to get this done.

In [9]:

# YOUR CODE HERE
df['year'] + 1

Out[9]:

0       1963
1       1968
2       1973
3       1978
4       1983
        ... 
1315    1988
1316    1993
1317    1998
1318    2003
1319    2008
Name: year, Length: 1320, dtype: int64

🥊 Challenge 4: Check the Data Type¶

What is the data type of the output of describe()?

In [ ]:

type(df.describe())

🥊 Challenge 5: Loops and Plots¶

Let's say you have a list of countries you want to compare life expectancy for using a single lineplot. We will create a function for this.

We have set up the list and function for you. Your goal is to:

Add three country names in the DataFrame to country_list.
Add two parameters to the function; one for a DataFrame, and one for the list of countries.
Within the function block, loop over the list of countries.
Within the for-loop, create a subset of the DataFrame using a comparison operator that sets country_data to the subset of the country you are looping over in the list.
In the label= parameter of plt.plot(), fill in the loop variable name.

Run the cell when you're done: if you've succeeded, you should see a single line plot with life expectancy for all of the countries in country_list.

💡 Tip: If you have time left, try to add labels and title to the plot using plt.xlabel(), plt.ylabel(), and plt.title(). See this resource for more information!

In [19]:

# YOUR CODE HERE

country_list = ['Afghanistan', 'Canada', 'Zimbabwe']

def plot_life_expectancy(df, countries):
    for country in countries:
        country_data = df[df['country'] == country]
        plt.plot(country_data['year'], country_data['lifeExp'], label=country)
    plt.xlabel('Years')
    plt.ylabel('Life Expectancy')
    plt.title('Life Expectancy by Country')
    plt.legend()
    plt.show()

plot_life_expectancy(df, country_list)