My First Data Science Project

Get the Data

Now, let's get the data from the List of helicopter prison escapes Wikipedia article.

Helicopter Escapes!

We begin by importing some helper functions.

In [11]:
from helper import *

Let's print the first three rows.

In [19]:
url='https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'
data=data_from_url(url)
index=0
for row in data:
    data[index]=row[:-1]
    index+=1
for row in data[:3]:
    print(row)
['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro']
['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"]
['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']

Now change the dates to just the year, and print the first three rows to make sure nothing awful has happened.

In [20]:
for row in data:
    row[0]=fetch_year(row[0])
for row in data[:3]:
    print(row)
[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro']
[1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"]
[1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']

Now we create a list whose entries are of the form [year, number of escape attempts]

In [23]:
min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]
years=[]
for y in range(min_year, max_year+1):
    years.append(y)
attempts_per_year=[]
for year in years:
    total=0
    for row in data:
        if row[0]==year:
            total+=1
    if total>=1:
        attempts_per_year.append([year,total])
print(attempts_per_year)
        
[[1971, 1], [1973, 1], [1978, 1], [1981, 2], [1983, 1], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1996, 1], [1997, 1], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2005, 2], [2006, 1], [2007, 3], [2009, 3], [2010, 1], [2012, 1], [2013, 2], [2014, 1], [2016, 1], [2018, 1], [2020, 1]]

We take the data and visualize it.

In [24]:
%matplotlib inline
barplot(attempts_per_year)

The years in which the most helicopter escapes were attempted are 1986, 2001, 2007, 2009. In each of these years there were three attempts.

In [26]:
countries_frequency=df["Country"].value_counts()
print_pretty_table(countries_frequency)
Country Number of Occurrences
France 15
United States 8
Greece 4
Belgium 4
Canada 4
United Kingdom 2
Brazil 2
Australia 2
Russia 1
Netherlands 1
Italy 1
Mexico 1
Puerto Rico 1
Chile 1
Ireland 1
In [ ]: