#!/usr/bin/env python
# coding: utf-8

# In[1]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# # Correlations Between The Metrics of The Best Startups

# ### Colums

# - **quatity score:** Takes into account different metrics, such as number of startups, number of coworking spaces, and number of accelerators, to establish the activity level of the startup ecosystem
# 
# - **quality score:** Studies parameters that indicate qualitative results achieved by the ecosystem. These parameters include analyzing the traction of the ecosystem’s top startups, as well as reviewing “special entities” the ecosystem has produced: Unicorns, Exits, and Pantheons
# 
# - **business score:** It is a mix of business and economic indicators at the national level, discounted for cities that haven't reached a critical mass either for Quantity or Quality.
# 
# - **total score:** The total score of the rankings is a sum of the quantity, quality, and business environment

# ### Goal

# What are correlations between the metrics of the best startups?

# In[2]:


cities = pd.read_csv('Best Cities for Startups.csv')
countries = pd.read_csv('Best Countries for Startups.csv')


# ### Cities

# In[3]:


print(cities.shape)
cities.head()


# In[4]:


cities.tail()


# In[5]:


cities.corr()


# In[6]:


sns.pairplot(cities)


# In[7]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(cities.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# There is a strong correlation between quality and quatity scores. We observe that there is a weak correlation bewtween quality score and business score.But it looks that there is a negative strong correlation between business score and position. So, the city rises to the top if business score increases. Moreover, even tough the cities rankings be determined according to total score, we observe that there is a negative weak correlation between position and total score. Let's split top 100 cities to understand why be like that  and examine correlation between them.

# ### Cities Top 100

# In[8]:


cities_top_100 = cities[:100]
cities_top_100.corr()


# In[9]:


sns.pairplot(cities_top_100)


# In[10]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(cities_top_100.corr(), vmin=-1, vmax=1, cbar=False,
                      cmap='RdBu', annot=True)


# When we examine top 100 cities in the dataset, correlation between quantity-quality scores and position gets strong. So, The best startups cities start arising. Let's check top 25 cities to verify that. 

# ### Cities Top 25

# In[11]:


cities_top_25 = cities[:25]
cities_top_25.corr()


# In[12]:


sns.pairplot(cities_top_25)


# In[13]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(cities_top_25.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# When we check the dataset, it supports our idea above. Because business score decreased so quantity-quality scores become determiner. So let's examine those cities.

# In[14]:


cities_top_25_copy = cities_top_25.copy()


# In[15]:


cities_top_25_copy['best_ratio'] = cities_top_25_copy['quality score'] / cities_top_25_copy['quatity score']
cities_top_25_copy[['city','best_ratio','business score']]


# In[81]:


plt.figure(figsize=(11,9))
plt.barh(cities_top_25_copy['city'],cities_top_25_copy['best_ratio'])
plt.xlabel('Startsup Ratio')
plt.ylabel('Cities')
plt.show()


# When we compare  quantity and quality as propotion, even tough number of coworking spaces and number of accelerators are less, the number of startups may be more. But it can't prove that. It can cause to ask us that question, is Beijing become the best startup city of the world if Beijing has the same quantity score with quantity score of San Fransisco?

# In[17]:


cities.iloc[[0,2]].plot.bar()


# In[18]:


cities.iloc[[0,2]].plot.bar(x='city', y= 'business score')


# Business Score of San Fransisco Bay is higher than Business Score of Beijing. So, It may mean that the Unicorn,Pantheorn startups will build in San Fransisco Bay because business score of Beijing is lower, if Beijing has the same quantity score with quantity score of San Fransisco Bay. Which it can mean that there will be the best startups in San Fransisco Bay.

# ### Countries

# In[19]:


countries.head(8)


# In[20]:


countries.tail()


# In[21]:


countries.corr()


# In[22]:


sns.pairplot(countries)


# In[23]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(countries.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# In[24]:


business_score_top = countries[countries['business score'] >= 3.0]
business_score_top


# In[25]:


business_score_top.plot.barh(x='country', y= 'business score',legend=False,figsize=(10,7),
                                                       title=' Countries of The Best Business Score')


# ### Countries Top 20

# In[26]:


countries_20 = countries[:20]
countries_20


# Let's split the dataset two pieces. The first dataset will be the top eight countries, the other dataset will be  the rest countries.

# In[27]:


countries_20.corr()


# In[28]:


sns.pairplot(countries_20)


# In[29]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# In[80]:


plt.figure(figsize=(11,9))
plt.barh(countries_20['country'],countries_20['total score'])
plt.xlabel('total score')
plt.ylabel('Best Countries')
plt.title('The Best Startups Countries')
plt.show()


# In[31]:


countries_20_copy = countries_20.copy()


# In[71]:


countries_20_copy['best_ratio'] = countries_20_copy['quality score'] / countries_20_copy['quantity score']
compare_c_20 = countries_20_copy[['country','best_ratio','business score']]
compare_c_20


# In[79]:


compare_c_20.plot.bar(x='country',figsize=(12,7))


# ### Top 8 from Countries Top 20

# In[34]:


countries_20_top_8 = countries_20[:8]
countries_20_top_8


# In[35]:


countries_20_top_8.corr()


# In[36]:


sns.pairplot(countries_20_top_8,hue = 'country', diag_kind="hist")


# In[37]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20_top_8.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# ### Last 12 from Countries Top 20

# In[38]:


countries_20_last_12 = countries_20[8:]
countries_20_last_12


# In[39]:


countries_20_last_12.corr()


# In[40]:


sns.pairplot(countries_20_last_12,hue = 'country', diag_kind="hist")


# In[41]:


plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20_last_12.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)


# **Summarize;**
# 
# - Quantity score, quality score and business score are significant metrics to determine the best countries or cities.
# - If quantity score is high, quality score high. There is a positive strong correlation between both.
# - If business score high, it can mean to build better startups.(such as unicorn and pantheon)
# - When we examine the best 20 startups countries, the development of democracy and the absence of military coups provide an environment of stability. This stability may influence investor opinions and provide confidence to investors.
# - The policy of the government to encourage investors can play a big role in determining the best startup countries and cities.