This project marks the beginning of the path in becoming a Data Analyst/Scientist.
Throughout this journey, projects will showcase the skills learned up to that point.
This project will utilize Jupyter Notebooks and Python techniques at a novice level.
Being a guided project, helper functions and code are imported to assist in retreiving and visualizing data in the form of frequency tables and barplots.
An analysis of helicopter prison escapes will be conducted in this notebook.
To start, we would like to know the frequency of escapes per year and per country.
Lettuce begin by importing helper functions.
from helper import *
The data we will be using is sourced from Wikipedia. Using the helper function data_from_url
allows for easy extraction of the data into the variable data
.
List of helicopter prison escapes: [https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes]
url = 'https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'
data = data_from_url(url)
Let's print the first three rows.
for row in data[:3]:
print(row)
['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired the 1975 action movie Breakout, which starred Charles Bronson and Robert Duvall.[9]"] ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin\'s Mountjoy Jail\'s D Wing at 3:40\xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O\'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It\'s up like a bird and over the city. There\'s three men a\'missing I heard the warder say".[1]'] ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A few months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]
Printing the first few rows returned a lot of text. Let's improve readability by removing the final column.
Iterate over the dataset data
and remove the final column for each row.
index = 0
for row in data:
data[index] = row[:-1] # for every iteration, each row will be replaced by the original row minus the final column "[:-1]"
index += 1
print(data[:3])
[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
The date can also be modified to consider only the year.
Using the helper function, fetch_year_row()
, the string date is converted into an integer year.
The first three rows shall be printed to see if the function executed correctly.
for row in data:
row[0] = fetch_year(row[0])
print(data[:3])
[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
The guided code below will be used to make a range of all years from first registered prison escape attempt to the last.
Print the list years
to ensure our code functioned.
min_year = min(data, key=lambda x: x[0])[0] # This guided code sets the max and min parameters for the range of years
max_year = max(data, key=lambda x: x[0])[0]
years = []
for y in range(min_year, max_year + 1): # In the range setup above, the years, y, are appended to the empty list
years.append(y)
print(years)
[1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
Utilizing for
loops, we will create a sort of frequency table using the previous years
list.
Create a template for a list of lists concerning every year that had a prison escape by helicopter and the no. of attempts during each year.
attempts_per_year = []
for i in years: # Accessing the newly created list years from the previous cell
attempts_per_year.append([i,0])
print(attempts_per_year)
[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]
Now iterate over both the modified dataset data
and the list of years
.
If the year is found in both, then a count will start for that year and continue until the datasets have been iterated over.
for row in data:
for year in attempts_per_year:
y = year[0]
if row[0] == y:
year[1] += 1
print(attempts_per_year)
[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]
The list attempts_per_year
now show the frequency of the number of escapes per given year.
Let us use the guided code below and matplotlib packages to visualize this data.
%matplotlib inline
barplot(attempts_per_year)
The years 2009, 2007, 2001, and 1986 are tied for most attempts at a prison escape by helicopter.
Now let us again use the guided code below to visualize the frequency of escapes by country.
countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)
Country | Number of Occurrences |
---|---|
France | 15 |
United States | 8 |
Belgium | 4 |
Greece | 4 |
Canada | 4 |
Australia | 2 |
United Kingdom | 2 |
Brazil | 2 |
Puerto Rico | 1 |
Chile | 1 |
Italy | 1 |
Netherlands | 1 |
Ireland | 1 |
Russia | 1 |
Mexico | 1 |
Jupyter notebook was utilized to showcase a beginner's introduction to the Python programming language.
Using guided code and functions, the extraction of helicopter prison escapes data and visualization thereof was done to answer questions like "Which countries have the most prison escapes by helicopter?". (It's France)
In which countries do helicopter prison breaks have higher chance of 'success'? (not attempt) How does the number of escapees affect success?