Project 1: Guided Project

Helicopter Prison Breaks

This is a guided project from DataQuest that explores the data that is available for helicopter prison breaks.

First conclusion: yes they do happen.

This project will aim to answer below questions:

  1. In which year did the most helicopter prison break attempts occur?
  2. In which countries do the most attempted helicopter prison breaks occur?
  3. In which countries do helicopter prison breaks have a higher chance of success?
  4. How does the number of escapees affect the success?
  5. Which escapees have done it more than once?

We begin by importing some helper functions:

In [147]:
from helper import *

Get the Data

Starting with getting the needed data. For this project the data to be used is extracted from List of helicopter prison escapes Wikipedia article.

In [148]:
url = str('https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes')
data = data_from_url(url)

We proceed then to do a little data exploration:

In [149]:
for i in data[:3]:
    print(i)
    
['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired the 1975 action movie Breakout, which starred Charles Bronson and Robert Duvall.[9]"]
['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin\'s Mountjoy Jail\'s D Wing at 3:40\xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O\'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It\'s up like a bird and over the city. There\'s three men a\'missing I heard the warder say".[1]']
['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A few months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]

From the printed data, the "Description" column takes a lot of space and is not really relevant to the questions asked in the project. So let's remove these for now.

In [150]:
index=0

for row in data:
    data[index]=row[:5]
    index+=1

print(data[:3])
[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Seeing the date fields, we notice that they are complete dates. The questions that we need to answer are all year-based, sowe need to convert the dates to contain only years.

In [151]:
for j in data:
    j[0]=fetch_year(j[0])

print(data[:5])
[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson'], [1981, 'Fleury-Mérogis, Essonne, Ile de France', 'France', 'Yes', 'Gérard DupréDaniel Beaumont'], [1981, 'Orsainville Prison, Quebec City', 'Canada', 'No', 'Marina Paquet (hijacker)Giles Arseneault (prisoner)']]

Now that we have wrangled and reformed our data into a more useable format, we begin to answer some of the questions.

First question we will answer is

Q1 In which year did the most helicopter prison break attempts occur?

To answer this, we must count the number of occurences per year.

In [152]:
min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]

years = []
for y in range(min_year, max_year + 1):
    years.append(y)

#This determines the oldest year in our record
min_year = min(data, key=lambda x: x[0])[0]

#This determined the latest year in our record
max_year = max(data, key=lambda x: x[0])[0]


attemps = []
attemps_per_year = []

for y in range(min_year, max_year + 1):
    attemp=0       #define attemp variable to record number of attempts per year          
    for i in data:     #loop into each year from min to max
        if y == i[0]:  #count all attempts per year in the ra
            attemp += 1
        attemps = [y , attemp]     
    attemps_per_year.append(attemps)

print(attemps_per_year)
[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]

The output above shows a list of lists with the year and the number of occurences, however this is very difficult to visualize. To make it easier to answer Q1, we use a histogram.

In [153]:
%matplotlib inline
barplot(attemps_per_year)

With the above visualization it becomes easier for us to know what year has the greatest number of attempts.

A1: 1986, 2001,2007, 2009

Now we are ready to answer Q2

Q2: In which countries do the most attempted helicopter prison breaks occur?

For this, we can perform the same operation as we did in the year only replacing the year with country.

However, we are introduced with a new function: valuecounts() which counts the number of occurences given an index.

helper function already defined df for us as our data using below codes, so we dont need to redefine.

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes")[1] df = df[["Date", "Prison name", "Country", "Succeeded", "Escapee(s)"]]

In [154]:
#this counts the number of occurences for each unique value in indec "Country"
countries_frequency = df["Country"].value_counts()

#we can print the result just to checl
print(countries_frequency)
France            15
United States      8
Canada             4
Greece             4
Belgium            4
United Kingdom     2
Australia          2
Brazil             2
Mexico             1
Italy              1
Chile              1
Puerto Rico        1
Ireland            1
Netherlands        1
Russia             1
Name: Country, dtype: int64

The above result is enough to actually answer Q2.

A2: The country with the highest number of attempts is France

We dont have a theory of why this could be the case as of yet but we can try to look at resources availability or security measures in France to see if we can glean any information on why they have so many attempts at prison breaks. NOTE: we can make another project out of this if we can find data.

For the purpose of a cleaner data visualization, let's use a prettier chart which has also been defined in Helper function

In [155]:
print_pretty_table(countries_frequency)
Country Number of Occurrences
France 15
United States 8
Canada 4
Greece 4
Belgium 4
United Kingdom 2
Australia 2
Brazil 2
Mexico 1
Italy 1
Chile 1
Puerto Rico 1
Ireland 1
Netherlands 1
Russia 1

Now let's try to answer Q3

Q3 In which countries do helicopter prison breaks have a higher chance of success?

To answer this we must define what we mean by "higher chance of success"

Do we mean higher probability or higher odds?

There's a link HERE that explains the difference.

For the purpose of this project I'm going to chance of success as Total Success/Total Attempts. Of course there are other ways to interpret this but for simplicity let's assume this.

In [156]:
df2 = df['Country'].unique()
df2
Out[156]:
array(['Mexico', 'Ireland', 'United States', 'France', 'Canada',
       'Australia', 'Brazil', 'Italy', 'United Kingdom', 'Puerto Rico',
       'Chile', 'Netherlands', 'Greece', 'Belgium', 'Russia'],
      dtype=object)
In [186]:
#First we must get the list of unique Countries

uniquecountries = df['Country'].unique()

result = [] 
success =[]
summary2=[]
#Now we loop at the unique countries and count the number 
#Successful vs Unsuccessful attempts 
for m in uniquecountries:
    yes= 0
    no = 0
    for k in data:   #loop into full data 
        if k[2] == m and k[3] == 'Yes':  #counts Yes per Country in Unique list
            yes += 1
        elif k[2] == m and k[3] == 'No':  #counts No per Country in Unique list
            no += 1      
        #result = [m]        
    successrate= round((yes/(yes+no))*100)  #calculate success rate per country
    result = [m,successrate]     #make a list of country and it's success rate
    summary=[m,yes,no,successrate] #make a list of the sum of success/fail/rate
    summary2.append(summary) #summarizing the summary list into one list
    success.append(result) #summarizing rate of success into one list
    
print(summary2)

        
[['Mexico', 1, 0, 100], ['Ireland', 1, 0, 100], ['United States', 6, 2, 75], ['France', 11, 4, 73], ['Canada', 3, 1, 75], ['Australia', 1, 1, 50], ['Brazil', 2, 0, 100], ['Italy', 1, 0, 100], ['United Kingdom', 1, 1, 50], ['Puerto Rico', 1, 0, 100], ['Chile', 1, 0, 100], ['Netherlands', 0, 1, 0], ['Greece', 2, 2, 50], ['Belgium', 2, 2, 50], ['Russia', 1, 0, 100]]

Above result shows us a summary of the Number of Successful attempt, Unsuccessful attemps and the rate of success vs total attempt

However this presentation would be better is shown in a barplot

In [188]:
barplot(success)

So from the above barplot we can answer Q3.

A3: The countries with highest chances of success are Mexico, Ireland, Brazil, Italy, Puerto Rico, Chile, and Russia.

However looking at these countries the number of attempts are 1 or 2, so I could say that the prob is high from theresults because there are less attempts and less data to conclude.

Looking at France for instance, the rate is 73% with total attempt of 14. Intuitively, I would say I'd take my chances there.

However, I wouldn't and instead would explore diff measure of "Chances" or even find more available data.

Now we proceed to answer Q4

Q4: How does the number of escapees affect the success?

How I would prefer to do this is to get a count of escapees per datapoint and loop into the min and max of that to calculate the success rate.

However, I couldnt find a way, for now, to efficiently count the individual names in df["Escapee()s"] column. When I progress further in the course I'd get back to this to revamp.

That said, I would just do a spot check.

For this I would use the groupby() function to group see the success result for each escapees entry.

In [220]:
df.groupby("Escapee(s)")["Succeeded"].value_counts().unstack().fillna(0)
Out[220]:
Succeeded No Yes
Escapee(s)
Abdelhamid CarnousEmile Forma-SariJean-Philippe Lecase 0.0 1.0
Alexey Shestakov 0.0 1.0
Alexin JismyFabrice Michel 0.0 1.0
André BellaïcheGianluigi EspositoLuciano Cipollari 0.0 1.0
Ashraf Sekkaki plus three other criminals 0.0 1.0
Ben Kramer 1.0 0.0
Benjamin Hudon-BarbeauDanny Provençal 0.0 1.0
Brian Lawrence 1.0 0.0
David McMillan 1.0 0.0
Eric AlboreoFranck PerlettoMichel Valero 0.0 1.0
Eric Ferdinand 0.0 1.0
Four members of the Manuel Rodriguez Patriotic Front 0.0 1.0
Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson 1.0 0.0
Gérard DupréDaniel Beaumont 0.0 1.0
Hubert SellesJean-Claude MorettiMohamed Bessame 0.0 1.0
JB O'Hagan Seamus TwomeyKevin Mallon 0.0 1.0
James Rodney LeonardWilliam Douglas BallewJesse Glenn Smith 0.0 1.0
Joel David Kaplan Carlos Antonio Contreras Castro 0.0 1.0
John Killick 0.0 1.0
José Carlos dos Reis Encina, a.k.a. "Escadinha" 0.0 1.0
Kristel A. 1.0 0.0
Mahoney Danny Francis MitchellRandy Lackey 0.0 1.0
Marina Paquet (hijacker)Giles Arseneault (prisoner) 1.0 0.0
Michel Vaujour 1.0 1.0
Nordin Benallal 1.0 0.0
Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose Tapia 0.0 1.0
Panagiotis Vlastos 1.0 0.0
Pascal Payet 0.0 2.0
Pola RoupaNikos Maziotis 1.0 0.0
Ralph BrownFreddie Gonzales 0.0 1.0
Robert FordDavid Thomas 0.0 1.0
Rédoine Faïd 0.0 1.0
Samantha Lopez 0.0 1.0
Steven Whitsett 0.0 1.0
Sydney DraperJohn Kendall 0.0 1.0
Vasilis PaleokostasAlket Rizai 0.0 1.0
Vassilis Paleokostas 0.0 1.0
William Lane 0.0 1.0
Yves DenisDenis LefebvreSerge Pomerleau 0.0 1.0
4.0 3.0

Success rate: Assumption, only those that have named escappes are considered

individual: 13 success, 7 failed 2 persons: 7 success, 1 failed 3 persons:9 success, 0 failed 4 and above: 3 success, 1 failed

In [227]:
print('1 person rating ' + str(13/20))
print('2 persons rating ' + str(7/8))
print('3 persons rating ' + str(9/9))
print('more than 3 persons rating ' + str(3/4))
1 person rating 0.65
2 persons rating 0.875
3 persons rating 1.0
more than 3 persons rating 0.75

Looking at the rate of success from above it seems that the highest success rate would be for a team of 3 people. There does not seem to be a correlation between success and number of teams but it does seem like the rates are definitely higher for a team rather than solo.

Q5 Which escapees have done it more than once?

Referring back to the previous table Pascal Payet has done it twise and both times succeeded. Michel Vaujour has also done it twice but failed once.

In [ ]: