**Helicopter Prison Breaks**

This is a guided project from DataQuest that explores the data that is available for helicopter prison breaks.

*First conclusion: yes they do happen.*

This project will aim to answer below questions:

- In which year did the most helicopter prison break attempts occur?
- In which countries do the most attempted helicopter prison breaks occur?
- In which countries do helicopter prison breaks have a higher chance of success?
- How does the number of escapees affect the success?
- Which escapees have done it more than once?

We begin by importing some helper functions:

In [147]:

```
from helper import *
```

Starting with getting the needed data. For this project the data to be used is extracted from List of helicopter prison escapes Wikipedia article.

In [148]:

```
url = str('https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes')
data = data_from_url(url)
```

We proceed then to do a little data exploration:

In [149]:

```
for i in data[:3]:
print(i)
```

From the printed data, the "Description" column takes a lot of space and is not really relevant to the questions asked in the project. So let's remove these for now.

In [150]:

```
index=0
for row in data:
data[index]=row[:5]
index+=1
print(data[:3])
```

Seeing the date fields, we notice that they are complete dates. The questions that we need to answer are all year-based, sowe need to convert the dates to contain only years.

In [151]:

```
for j in data:
j[0]=fetch_year(j[0])
print(data[:5])
```

Now that we have wrangled and reformed our data into a more useable format, we begin to answer some of the questions.

First question we will answer is

To answer this, we must count the number of occurences per year.

In [152]:

```
min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]
years = []
for y in range(min_year, max_year + 1):
years.append(y)
#This determines the oldest year in our record
min_year = min(data, key=lambda x: x[0])[0]
#This determined the latest year in our record
max_year = max(data, key=lambda x: x[0])[0]
attemps = []
attemps_per_year = []
for y in range(min_year, max_year + 1):
attemp=0 #define attemp variable to record number of attempts per year
for i in data: #loop into each year from min to max
if y == i[0]: #count all attempts per year in the ra
attemp += 1
attemps = [y , attemp]
attemps_per_year.append(attemps)
print(attemps_per_year)
```

The output above shows a list of lists with the year and the number of occurences, however this is very difficult to visualize. To make it easier to answer Q1, we use a histogram.

In [153]:

```
%matplotlib inline
barplot(attemps_per_year)
```

With the above visualization it becomes easier for us to know what year has the greatest number of attempts.

**A1: 1986, 2001,2007, 2009**

Now we are ready to answer Q2

For this, we can perform the same operation as we did in the year only replacing the year with country.

However, we are introduced with a new function:
*valuecounts()*
which counts the number of occurences given an index.

**helper** function already defined *df* for us as our data using below codes, so we dont need to redefine.

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes")[1] df = df[["Date", "Prison name", "Country", "Succeeded", "Escapee(s)"]]

In [154]:

```
#this counts the number of occurences for each unique value in indec "Country"
countries_frequency = df["Country"].value_counts()
#we can print the result just to checl
print(countries_frequency)
```

The above result is enough to actually answer Q2.

**A2: The country with the highest number of attempts is France**

We dont have a theory of why this could be the case as of yet but we can try to look at resources availability or security measures in France to see if we can glean any information on why they have so many attempts at prison breaks. NOTE: we can make another project out of this if we can find data.

For the purpose of a cleaner data visualization, let's use a prettier chart which has also been defined in *Helper* function

In [155]:

```
print_pretty_table(countries_frequency)
```

Now let's try to answer Q3

To answer this we must define what we mean by "higher chance of success"

Do we mean higher probability or higher odds?

There's a link HERE that explains the difference.

For the purpose of this project I'm going to chance of success as *Total Success/Total Attempts*. Of course there are other ways to interpret this but for simplicity let's assume this.

In [156]:

```
df2 = df['Country'].unique()
df2
```

Out[156]:

In [186]:

```
#First we must get the list of unique Countries
uniquecountries = df['Country'].unique()
result = []
success =[]
summary2=[]
#Now we loop at the unique countries and count the number
#Successful vs Unsuccessful attempts
for m in uniquecountries:
yes= 0
no = 0
for k in data: #loop into full data
if k[2] == m and k[3] == 'Yes': #counts Yes per Country in Unique list
yes += 1
elif k[2] == m and k[3] == 'No': #counts No per Country in Unique list
no += 1
#result = [m]
successrate= round((yes/(yes+no))*100) #calculate success rate per country
result = [m,successrate] #make a list of country and it's success rate
summary=[m,yes,no,successrate] #make a list of the sum of success/fail/rate
summary2.append(summary) #summarizing the summary list into one list
success.append(result) #summarizing rate of success into one list
print(summary2)
```

Above result shows us a summary of the Number of Successful attempt, Unsuccessful attemps and the rate of success vs total attempt

However this presentation would be better is shown in a barplot

In [188]:

```
barplot(success)
```

So from the above barplot we can answer Q3.

**A3: The countries with highest chances of success are Mexico, Ireland, Brazil, Italy, Puerto Rico, Chile, and Russia.**

However looking at these countries the number of attempts are 1 or 2, so I could say that the prob is high from theresults because there are less attempts and less data to conclude.

Looking at France for instance, the rate is 73% with total attempt of 14. Intuitively, I would say I'd take my chances there.

However, I wouldn't and instead would explore diff measure of "Chances" or even find more available data.

Now we proceed to answer Q4

How I would prefer to do this is to get a count of escapees per datapoint and loop into the min and max of that to calculate the success rate.

However, I couldnt find a way, for now, to efficiently count the individual names in df["Escapee()s"] column. When I progress further in the course I'd get back to this to revamp.

That said, I would just do a spot check.

For this I would use the groupby() function to group see the success result for each escapees entry.

In [220]:

```
df.groupby("Escapee(s)")["Succeeded"].value_counts().unstack().fillna(0)
```

Out[220]:

Success rate: Assumption, only those that have named escappes are considered

individual: 13 success, 7 failed 2 persons: 7 success, 1 failed 3 persons:9 success, 0 failed 4 and above: 3 success, 1 failed

In [227]:

```
print('1 person rating ' + str(13/20))
print('2 persons rating ' + str(7/8))
print('3 persons rating ' + str(9/9))
print('more than 3 persons rating ' + str(3/4))
```

Looking at the rate of success from above it seems that the highest success rate would be for a team of 3 people. There does not seem to be a correlation between success and number of teams but it does seem like the rates are definitely higher for a team rather than solo.

Referring back to the previous table **Pascal Payet** has done it twise and both times succeeded. **Michel Vaujour** has also done it twice but failed once.

In [ ]:

```
```