In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Correlations Between The Metrics of The Best Startups

Colums

  • quatity score: Takes into account different metrics, such as number of startups, number of coworking spaces, and number of accelerators, to establish the activity level of the startup ecosystem

  • quality score: Studies parameters that indicate qualitative results achieved by the ecosystem. These parameters include analyzing the traction of the ecosystem’s top startups, as well as reviewing “special entities” the ecosystem has produced: Unicorns, Exits, and Pantheons

  • business score: It is a mix of business and economic indicators at the national level, discounted for cities that haven't reached a critical mass either for Quantity or Quality.

  • total score: The total score of the rankings is a sum of the quantity, quality, and business environment

Goal

What are correlations between the metrics of the best startups?

In [2]:
cities = pd.read_csv('Best Cities for Startups.csv')
countries = pd.read_csv('Best Countries for Startups.csv')

Cities

In [3]:
print(cities.shape)
cities.head()
(1000, 9)
Out[3]:
position change in position from 2020 city country total score quatity score quality score business score sign of change in position
0 1 0 San Francisco Bay United States 328.966 29.14 296.02 3.80 NaN
1 2 0 New York United States 110.777 11.43 95.55 3.80 NaN
2 3 3 Beijing China 66.049 5.01 58.61 2.43 +
3 4 1 Los Angeles Area United States 58.441 11.23 43.41 3.80 +
4 5 2 London United Kingdom 56.913 15.77 37.44 3.70 -
In [4]:
cities.tail()
Out[4]:
position change in position from 2020 city country total score quatity score quality score business score sign of change in position
995 996 26 Ouagadougou Burkina Faso 0.060 0.02 0.02 0.02 -
996 997 new Baghdad Iraq 0.058 0.01 0.02 0.03 NaN
997 998 13 Mbabane Swaziland 0.057 0.01 0.02 0.03 -
998 999 new Conakry Guinea 0.047 0.01 0.02 0.02 NaN
999 1000 new Sanaa Yemen 0.037 0.01 0.02 0.01 NaN
In [5]:
cities.corr()
Out[5]:
position total score quatity score quality score business score
position 1.000000 -0.279582 -0.358950 -0.186913 -0.806506
total score -0.279582 1.000000 0.921991 0.991778 0.358643
quatity score -0.358950 0.921991 1.000000 0.881724 0.453313
quality score -0.186913 0.991778 0.881724 1.000000 0.244499
business score -0.806506 0.358643 0.453313 0.244499 1.000000
In [6]:
sns.pairplot(cities)
Out[6]:
<seaborn.axisgrid.PairGrid at 0x2e646570ee0>
In [7]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(cities.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)

There is a strong correlation between quality and quatity scores. We observe that there is a weak correlation bewtween quality score and business score.But it looks that there is a negative strong correlation between business score and position. So, the city rises to the top if business score increases. Moreover, even tough the cities rankings be determined according to total score, we observe that there is a negative weak correlation between position and total score. Let's split top 100 cities to understand why be like that and examine correlation between them.

Cities Top 100

In [8]:
cities_top_100 = cities[:100]
cities_top_100.corr()
Out[8]:
position total score quatity score quality score business score
position 1.000000 -0.434891 -0.578036 -0.416559 0.110732
total score -0.434891 1.000000 0.915673 0.998811 0.069569
quatity score -0.578036 0.915673 1.000000 0.896382 0.058255
quality score -0.416559 0.998811 0.896382 1.000000 0.052109
business score 0.110732 0.069569 0.058255 0.052109 1.000000
In [9]:
sns.pairplot(cities_top_100)
Out[9]:
<seaborn.axisgrid.PairGrid at 0x2e645472970>
In [10]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(cities_top_100.corr(), vmin=-1, vmax=1, cbar=False,
                      cmap='RdBu', annot=True)

When we examine top 100 cities in the dataset, correlation between quantity-quality scores and position gets strong. So, The best startups cities start arising. Let's check top 25 cities to verify that.

Cities Top 25

In [11]:
cities_top_25 = cities[:25]
cities_top_25.corr()
Out[11]:
position total score quatity score quality score business score
position 1.000000 -0.575365 -0.659024 -0.562125 -0.053008
total score -0.575365 1.000000 0.913301 0.999004 0.233821
quatity score -0.659024 0.913301 1.000000 0.894715 0.288322
quality score -0.562125 0.999004 0.894715 1.000000 0.215706
business score -0.053008 0.233821 0.288322 0.215706 1.000000
In [12]:
sns.pairplot(cities_top_25)
Out[12]:
<seaborn.axisgrid.PairGrid at 0x2e6567251f0>
In [13]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(cities_top_25.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)

When we check the dataset, it supports our idea above. Because business score decreased so quantity-quality scores become determiner. So let's examine those cities.

In [14]:
cities_top_25_copy = cities_top_25.copy()
In [15]:
cities_top_25_copy['best_ratio'] = cities_top_25_copy['quality score'] / cities_top_25_copy['quatity score']
cities_top_25_copy[['city','best_ratio','business score']]
Out[15]:
city best_ratio business score
0 San Francisco Bay 10.158545 3.80
1 New York 8.359580 3.80
2 Beijing 11.698603 2.43
3 Los Angeles Area 3.865539 3.80
4 London 2.374128 3.70
5 Boston Area 7.369091 3.80
6 Shanghai 10.131653 2.43
7 Tel Aviv Area 4.930693 3.13
8 Moscow 2.122117 2.39
9 Bangalore 3.561508 2.38
10 Paris 3.153700 3.41
11 Seattle 4.882521 3.80
12 Berlin 4.057072 3.49
13 New Delhi 2.363946 2.61
14 Tokyo 6.400000 3.30
15 Mumbai 4.140673 2.61
16 Chicago 2.505721 3.80
17 Austin 3.625000 3.80
18 Washington DC Area 3.109510 3.80
19 Sao Paulo 4.836502 2.29
20 Shenzhen 7.761364 1.99
21 San Diego 4.102273 3.80
22 Seoul 7.393750 3.24
23 Stockholm 5.535519 3.78
24 Singapore City 2.927215 3.30
In [81]:
plt.figure(figsize=(11,9))
plt.barh(cities_top_25_copy['city'],cities_top_25_copy['best_ratio'])
plt.xlabel('Startsup Ratio')
plt.ylabel('Cities')
plt.show()

When we compare quantity and quality as propotion, even tough number of coworking spaces and number of accelerators are less, the number of startups may be more. But it can't prove that. It can cause to ask us that question, is Beijing become the best startup city of the world if Beijing has the same quantity score with quantity score of San Fransisco?

In [17]:
cities.iloc[[0,2]].plot.bar()
Out[17]:
<AxesSubplot:>
In [18]:
cities.iloc[[0,2]].plot.bar(x='city', y= 'business score')
Out[18]:
<AxesSubplot:xlabel='city'>

Business Score of San Fransisco Bay is higher than Business Score of Beijing. So, It may mean that the Unicorn,Pantheorn startups will build in San Fransisco Bay because business score of Beijing is lower, if Beijing has the same quantity score with quantity score of San Fransisco Bay. Which it can mean that there will be the best startups in San Fransisco Bay.

Countries

In [19]:
countries.head(8)
Out[19]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
0 1.0 0 United States 124.420 19.45 101.17 3.80 NaN
1 2.0 0 United Kingdom 28.719 8.16 16.86 3.70 NaN
2 3.0 0 Israel 27.741 5.48 19.14 3.13 NaN
3 4.0 0 Canada 19.876 6.58 9.75 3.55 NaN
4 5.0 0 Germany 17.053 3.64 9.93 3.49 NaN
5 6.0 4 Sweden 15.423 2.40 9.24 3.78 +
6 7.0 7 China 15.128 1.33 11.46 2.34 +
7 8.0 0 Switzerland 14.943 3.82 7.58 3.54 NaN
In [20]:
countries.tail()
Out[20]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
96 97.0 8 Uganda 0.180 0.07 0.04 0.07 -
97 98.0 2 Nepal 0.172 0.06 0.04 0.08 +
98 99.0 new entry Namibia 0.165 0.04 0.05 0.07 NaN
99 100.0 new entry Ethiopia 0.162 0.07 0.06 0.03 NaN
100 NaN NaN NaN NaN NaN NaN NaN NaN
In [21]:
countries.corr()
Out[21]:
ranking total score quantity score quality score business score
ranking 1.000000 -0.520022 -0.613922 -0.397828 -0.952646
total score -0.520022 1.000000 0.958664 0.988242 0.480830
quantity score -0.613922 0.958664 1.000000 0.914458 0.576589
quality score -0.397828 0.988242 0.914458 1.000000 0.350522
business score -0.952646 0.480830 0.576589 0.350522 1.000000
In [22]:
sns.pairplot(countries)
Out[22]:
<seaborn.axisgrid.PairGrid at 0x2e65786c8e0>
In [23]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(countries.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)
In [24]:
business_score_top = countries[countries['business score'] >= 3.0]
business_score_top
Out[24]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
0 1.0 0 United States 124.420 19.45 101.17 3.80 NaN
1 2.0 0 United Kingdom 28.719 8.16 16.86 3.70 NaN
2 3.0 0 Israel 27.741 5.48 19.14 3.13 NaN
3 4.0 0 Canada 19.876 6.58 9.75 3.55 NaN
4 5.0 0 Germany 17.053 3.64 9.93 3.49 NaN
5 6.0 4 Sweden 15.423 2.40 9.24 3.78 +
7 8.0 0 Switzerland 14.943 3.82 7.58 3.54 NaN
8 9.0 2 Australia 13.835 4.46 5.88 3.50 -
10 11.0 5 The Netherlands 13.700 3.44 6.96 3.30 -
11 12.0 0 France 13.286 3.03 6.85 3.41 NaN
12 13.0 2 Estonia 12.428 3.19 5.77 3.47 -
13 14.0 1 Finland 11.582 2.68 5.26 3.64 -
14 15.0 6 Spain 11.146 3.48 4.35 3.31 -
15 16.0 1 Lithuania 9.992 3.77 2.98 3.25 -
17 18.0 0 Ireland 9.633 2.51 3.68 3.44 NaN
18 19.0 0 South Korea 8.888 0.68 4.96 3.24 NaN
20 21.0 0 Japan 8.709 0.99 4.42 3.30 NaN
21 22.0 0 Denmark 8.368 2.04 2.68 3.65 NaN
22 23.0 1 Belgium 7.359 2.07 1.98 3.31 +
25 26.0 4 Taiwan 6.946 1.50 2.09 3.36 +
27 28.0 0 Austria 6.936 1.75 1.67 3.52 NaN
28 29.0 4 Italy 6.602 1.68 1.87 3.06 -
29 30.0 3 Poland 6.515 1.40 1.95 3.17 -
30 31.0 2 Norway 6.386 1.15 1.57 3.66 +
31 32.0 6 Czechia 6.226 1.24 1.72 3.26 -
32 33.0 14 New Zealand 5.865 1.05 1.12 3.69 +
In [25]:
business_score_top.plot.barh(x='country', y= 'business score',legend=False,figsize=(10,7),
                                                       title=' Countries of The Best Business Score')
Out[25]:
<AxesSubplot:title={'center':' Countries of The Best Business Score'}, ylabel='country'>

Countries Top 20

In [26]:
countries_20 = countries[:20]
countries_20
Out[26]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
0 1.0 0 United States 124.420 19.45 101.17 3.80 NaN
1 2.0 0 United Kingdom 28.719 8.16 16.86 3.70 NaN
2 3.0 0 Israel 27.741 5.48 19.14 3.13 NaN
3 4.0 0 Canada 19.876 6.58 9.75 3.55 NaN
4 5.0 0 Germany 17.053 3.64 9.93 3.49 NaN
5 6.0 4 Sweden 15.423 2.40 9.24 3.78 +
6 7.0 7 China 15.128 1.33 11.46 2.34 +
7 8.0 0 Switzerland 14.943 3.82 7.58 3.54 NaN
8 9.0 2 Australia 13.835 4.46 5.88 3.50 -
9 10.0 6 Singapore 13.745 3.21 7.69 2.84 +
10 11.0 5 The Netherlands 13.700 3.44 6.96 3.30 -
11 12.0 0 France 13.286 3.03 6.85 3.41 NaN
12 13.0 2 Estonia 12.428 3.19 5.77 3.47 -
13 14.0 1 Finland 11.582 2.68 5.26 3.64 -
14 15.0 6 Spain 11.146 3.48 4.35 3.31 -
15 16.0 1 Lithuania 9.992 3.77 2.98 3.25 -
16 17.0 0 Russia 9.813 2.17 5.14 2.51 NaN
17 18.0 0 Ireland 9.633 2.51 3.68 3.44 NaN
18 19.0 0 South Korea 8.888 0.68 4.96 3.24 NaN
19 20.0 3 India 8.833 1.83 4.40 2.61 +

Let's split the dataset two pieces. The first dataset will be the top eight countries, the other dataset will be the rest countries.

In [27]:
countries_20.corr()
Out[27]:
ranking total score quantity score quality score business score
ranking 1.000000 -0.546039 -0.616779 -0.522993 -0.398569
total score -0.546039 1.000000 0.955399 0.997841 0.338033
quantity score -0.616779 0.955399 1.000000 0.934549 0.453367
quality score -0.522993 0.997841 0.934549 1.000000 0.295646
business score -0.398569 0.338033 0.453367 0.295646 1.000000
In [28]:
sns.pairplot(countries_20)
Out[28]:
<seaborn.axisgrid.PairGrid at 0x2e659c48640>
In [29]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)
In [80]:
plt.figure(figsize=(11,9))
plt.barh(countries_20['country'],countries_20['total score'])
plt.xlabel('total score')
plt.ylabel('Best Countries')
plt.title('The Best Startups Countries')
plt.show()
In [31]:
countries_20_copy = countries_20.copy()
In [71]:
countries_20_copy['best_ratio'] = countries_20_copy['quality score'] / countries_20_copy['quantity score']
compare_c_20 = countries_20_copy[['country','best_ratio','business score']]
compare_c_20
Out[71]:
country best_ratio business score
0 United States 5.201542 3.80
1 United Kingdom 2.066176 3.70
2 Israel 3.492701 3.13
3 Canada 1.481763 3.55
4 Germany 2.728022 3.49
5 Sweden 3.850000 3.78
6 China 8.616541 2.34
7 Switzerland 1.984293 3.54
8 Australia 1.318386 3.50
9 Singapore 2.395639 2.84
10 The Netherlands 2.023256 3.30
11 France 2.260726 3.41
12 Estonia 1.808777 3.47
13 Finland 1.962687 3.64
14 Spain 1.250000 3.31
15 Lithuania 0.790451 3.25
16 Russia 2.368664 2.51
17 Ireland 1.466135 3.44
18 South Korea 7.294118 3.24
19 India 2.404372 2.61
In [79]:
compare_c_20.plot.bar(x='country',figsize=(12,7))
Out[79]:
<AxesSubplot:xlabel='country'>

Top 8 from Countries Top 20

In [34]:
countries_20_top_8 = countries_20[:8]
countries_20_top_8
Out[34]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
0 1.0 0 United States 124.420 19.45 101.17 3.80 NaN
1 2.0 0 United Kingdom 28.719 8.16 16.86 3.70 NaN
2 3.0 0 Israel 27.741 5.48 19.14 3.13 NaN
3 4.0 0 Canada 19.876 6.58 9.75 3.55 NaN
4 5.0 0 Germany 17.053 3.64 9.93 3.49 NaN
5 6.0 4 Sweden 15.423 2.40 9.24 3.78 +
6 7.0 7 China 15.128 1.33 11.46 2.34 +
7 8.0 0 Switzerland 14.943 3.82 7.58 3.54 NaN
In [35]:
countries_20_top_8.corr()
Out[35]:
ranking total score quantity score quality score business score
ranking 1.000000 -0.681746 -0.791867 -0.652999 -0.405576
total score -0.681746 1.000000 0.959042 0.998174 0.339937
quantity score -0.791867 0.959042 1.000000 0.940510 0.477379
quality score -0.652999 0.998174 0.940510 1.000000 0.298494
business score -0.405576 0.339937 0.477379 0.298494 1.000000
In [36]:
sns.pairplot(countries_20_top_8,hue = 'country', diag_kind="hist")
Out[36]:
<seaborn.axisgrid.PairGrid at 0x2e65aab2d90>
In [37]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20_top_8.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)

Last 12 from Countries Top 20

In [38]:
countries_20_last_12 = countries_20[8:]
countries_20_last_12
Out[38]:
ranking change in position from 2020 country total score quantity score quality score business score change in position sign
8 9.0 2 Australia 13.835 4.46 5.88 3.50 -
9 10.0 6 Singapore 13.745 3.21 7.69 2.84 +
10 11.0 5 The Netherlands 13.700 3.44 6.96 3.30 -
11 12.0 0 France 13.286 3.03 6.85 3.41 NaN
12 13.0 2 Estonia 12.428 3.19 5.77 3.47 -
13 14.0 1 Finland 11.582 2.68 5.26 3.64 -
14 15.0 6 Spain 11.146 3.48 4.35 3.31 -
15 16.0 1 Lithuania 9.992 3.77 2.98 3.25 -
16 17.0 0 Russia 9.813 2.17 5.14 2.51 NaN
17 18.0 0 Ireland 9.633 2.51 3.68 3.44 NaN
18 19.0 0 South Korea 8.888 0.68 4.96 3.24 NaN
19 20.0 3 India 8.833 1.83 4.40 2.61 +
In [39]:
countries_20_last_12.corr()
Out[39]:
ranking total score quantity score quality score business score
ranking 1.000000 -0.983246 -0.762573 -0.739960 -0.373214
total score -0.983246 1.000000 0.707677 0.803915 0.367816
quantity score -0.762573 0.707677 1.000000 0.184262 0.376132
quality score -0.739960 0.803915 0.184262 1.000000 -0.012712
business score -0.373214 0.367816 0.376132 -0.012712 1.000000
In [40]:
sns.pairplot(countries_20_last_12,hue = 'country', diag_kind="hist")
Out[40]:
<seaborn.axisgrid.PairGrid at 0x2e65c31ac70>
In [41]:
plt.figure(figsize=(8,5))
ax = sns.heatmap(countries_20_last_12.corr(), vmin=-1, vmax=1, cbar=False,
                     cmap='RdBu', annot=True)

Summarize;

  • Quantity score, quality score and business score are significant metrics to determine the best countries or cities.
  • If quantity score is high, quality score high. There is a positive strong correlation between both.
  • If business score high, it can mean to build better startups.(such as unicorn and pantheon)
  • When we examine the best 20 startups countries, the development of democracy and the absence of military coups provide an environment of stability. This stability may influence investor opinions and provide confidence to investors.
  • The policy of the government to encourage investors can play a big role in determining the best startup countries and cities.