My First Data Science Project¶

Helicopter Escapes¶

We begin by importing some helper functions.

In [4]:

from helper import *

Get the Data¶

Now, let's get the data from the List of helicopter prison escapes Wikipedia article.

In [5]:

url = 'https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'
data = data_from_url(url)

Let's print the first three rows

In [6]:

for item in data[0:3]:
    print(item)

['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired the 1975 action movie Breakout, which starred Charles Bronson and Robert Duvall.[9]"]
['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin\'s Mountjoy Jail\'s D Wing at 3:40\xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O\'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It\'s up like a bird and over the city. There\'s three men a\'missing I heard the warder say".[1]']
['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A few months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]

Next, let's remove the column with the long text that provides the background information on the prison breaks.

In [7]:

index = 0

for row in data:
    data[index] = row[:-1]
    index += 1
    
print(data[0:3])

[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Using a provided function, change the date so that we have only the year:

In [8]:

for row in data:
    row[0] = fetch_year(row[0])
    
print(data[0:3])

[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Next, we are going to create a table that contains the year and a holder position for the number of attempts in that year.

In [9]:

min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]

years = []
for y in range(min_year, max_year + 1):
    years.append(y)
    
attempts_per_year = []

for year in years:
    attempts_per_year.append([year,0])
    
print(attempts_per_year)

[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]

Next, we will populate the attempts per year element of the list.

In [10]:

# Instruction 1 - for each row in data
for row in data:
    for ya in attempts_per_year: # Instruction 2 - nothing to do here
        # Instruction 3 - assign the year value in ya to y
        y = ya[0]
        if row[0] == y:
            ya[1] += 1

# Instruction 4 - print the results
print(attempts_per_year)

[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]

Using provided code, create a bar graph showing the number of attempts per year to better visualize the data.

In [11]:

%matplotlib inline
barplot(attempts_per_year)

The most attempts at prison breakouts with a helicopter were 1986, 2001, 2007, and 2009.

Next, we will create a frequency table showing the number of prison break attempts by helicopter by country.

In [12]:

countries_frequency = df["Country"].value_counts()

In [13]:

print_pretty_table(countries_frequency)

Country	Number of Occurrences
France	15
United States	8
Belgium	4
Greece	4
Canada	4
Australia	2
Brazil	2
United Kingdom	2
Russia	1
Ireland	1
Netherlands	1
Italy	1
Puerto Rico	1
Mexico	1
Chile	1

Next, let's try to answer the question: In which countries do helicopter prison breaks have a higher chance of success?

In [14]:

# First, get a list of the countries
countries = []
for row in data:
    country = row[2]
    if country not in countries:
        countries.append(country)
        
print(countries)

['Mexico', 'Ireland', 'United States', 'France', 'Canada', 'Australia', 'Brazil', 'Italy', 'United Kingdom', 'Puerto Rico', 'Chile', 'Netherlands', 'Greece', 'Belgium', 'Russia']

In [15]:

# Next, add empty fields for number of attempts and number 
# of successes

countries_success = []
for row in countries:
    countries_success.append([row,0,0])
    
print(countries_success)

[['Mexico', 0, 0], ['Ireland', 0, 0], ['United States', 0, 0], ['France', 0, 0], ['Canada', 0, 0], ['Australia', 0, 0], ['Brazil', 0, 0], ['Italy', 0, 0], ['United Kingdom', 0, 0], ['Puerto Rico', 0, 0], ['Chile', 0, 0], ['Netherlands', 0, 0], ['Greece', 0, 0], ['Belgium', 0, 0], ['Russia', 0, 0]]

In [16]:

# Next, populate number of attempts and number of successes

for row in data:
    for item in countries_success: 
        country = (item[0])
        if row[2] == country:
            country_count = item[1]
            country_count += 1
            item[1] = country_count
            if row[3] == "Yes":
                count_success = item[2]
                count_success += 1
                item[2] = count_success



print(countries_success)

[['Mexico', 1, 1], ['Ireland', 1, 1], ['United States', 8, 6], ['France', 15, 11], ['Canada', 4, 3], ['Australia', 2, 1], ['Brazil', 2, 2], ['Italy', 1, 1], ['United Kingdom', 2, 1], ['Puerto Rico', 1, 1], ['Chile', 1, 1], ['Netherlands', 1, 0], ['Greece', 4, 2], ['Belgium', 4, 2], ['Russia', 1, 1]]

From looking at the list of lists above, we can see that some countries have a 100% success rate, but these countries also have a low number of attempts. Without the ability to make graphs, a statement for each country might be the most effective way to communicate the results:

In [20]:

for country in countries_success:
    print("In " + country[0] + ", there were " + str(country[1]) + 
          " attempts.  Of these, there was " + str(country[2]) + 
          " success(es).  The success rate is " + str((country[2]/country[1]) *100)+"%\n")

In Mexico, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In Ireland, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In United States, there were 8 attempts.  Of these, there was 6 success(es).  The success rate is 75.0%

In France, there were 15 attempts.  Of these, there was 11 success(es).  The success rate is 73.33333333333333%

In Canada, there were 4 attempts.  Of these, there was 3 success(es).  The success rate is 75.0%

In Australia, there were 2 attempts.  Of these, there was 1 success(es).  The success rate is 50.0%

In Brazil, there were 2 attempts.  Of these, there was 2 success(es).  The success rate is 100.0%

In Italy, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In United Kingdom, there were 2 attempts.  Of these, there was 1 success(es).  The success rate is 50.0%

In Puerto Rico, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In Chile, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In Netherlands, there were 1 attempts.  Of these, there was 0 success(es).  The success rate is 0.0%

In Greece, there were 4 attempts.  Of these, there was 2 success(es).  The success rate is 50.0%

In Belgium, there were 4 attempts.  Of these, there was 2 success(es).  The success rate is 50.0%

In Russia, there were 1 attempts.  Of these, there was 1 success(es).  The success rate is 100.0%

In [ ]: