ANALYSIS OF THE WORLD'S POPULATION

INTRODUCTION

The term population is often used to refer to the total number of humans currently living on the earth.There are several factors affecting population growth in different regions of the world hence the huge difference in population between different countries.The primary factors affecting population include:birth rate, death rate'and migration.They account for how much a population is increasing or decreasing.

AIM

My aim is to provide the bureau with the required insights.

To achieve this goal we are going to use python libraries like pandas, numpy and matplotlib

1.1 Importing necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

1.2 Exploring our data

In [2]:
world_pop_data = pd.read_csv("world_population.xls")
world_pop_data = pd.DataFrame(world_pop_data)
world_pop_data.head(10)
Out[2]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
0 1 af Afghanistan 652230.0 652230.0 0.0 32564342.0 2.32 38.57 13.89 1.51
1 2 al Albania 28748.0 27398.0 1350.0 3029278.0 0.30 12.92 6.58 3.30
2 3 ag Algeria 2381741.0 2381741.0 0.0 39542166.0 1.84 23.67 4.31 0.92
3 4 an Andorra 468.0 468.0 0.0 85580.0 0.12 8.13 6.96 0.00
4 5 ao Angola 1246700.0 1246700.0 0.0 19625353.0 2.78 38.78 11.49 0.46
5 6 ac Antigua and Barbuda 442.0 442.0 0.0 92436.0 1.24 15.85 5.69 2.21
6 7 ar Argentina 2780400.0 2736690.0 43710.0 43431886.0 0.93 16.64 7.33 0.00
7 8 am Armenia 29743.0 28203.0 1540.0 3056382.0 0.15 13.61 9.34 5.80
8 9 as Australia 7741220.0 7682300.0 58920.0 22751014.0 1.07 12.15 7.14 5.65
9 10 au Austria 83871.0 82445.0 1426.0 8665550.0 0.55 9.41 9.42 5.56

2. Data cleaning

Dropping a few rows and columns which are not helpful in my analysis

In [3]:
# Finding the original number of rows
print("Original number of rows",world_pop_data.shape[0])

# finding the original number of columns
print("Original number of columns:",world_pop_data.shape[1])
Original number of rows 261
Original number of columns: 11
In [4]:
#dropping the id column 
del world_pop_data["id"]
world_pop_data

#confirming deletion of the id column
print("New number of columns:",world_pop_data.shape[1])
New number of columns: 10
In [5]:
#dropping rows where the area has a zero
world_pop_data = world_pop_data[(world_pop_data["area_land"] > 0.0) & (world_pop_data["population"] > 0.0)]
In [6]:
# dropping rows where the population column has a null value
world_pop_data = world_pop_data[world_pop_data["population"].notnull()]
world_pop_data
Out[6]:
code name area area_land area_water population population_growth birth_rate death_rate migration_rate
0 af Afghanistan 652230.0 652230.0 0.0 32564342.0 2.32 38.57 13.89 1.51
1 al Albania 28748.0 27398.0 1350.0 3029278.0 0.30 12.92 6.58 3.30
2 ag Algeria 2381741.0 2381741.0 0.0 39542166.0 1.84 23.67 4.31 0.92
3 an Andorra 468.0 468.0 0.0 85580.0 0.12 8.13 6.96 0.00
4 ao Angola 1246700.0 1246700.0 0.0 19625353.0 2.78 38.78 11.49 0.46
... ... ... ... ... ... ... ... ... ... ...
245 rq Puerto Rico 13791.0 8870.0 4921.0 3598357.0 0.60 10.86 8.67 8.15
246 vq Virgin Islands 1910.0 346.0 1564.0 103574.0 0.59 10.31 8.54 7.67
250 gz Gaza Strip 360.0 360.0 0.0 1869055.0 2.81 31.11 3.04 0.00
253 we West Bank 5860.0 5640.0 220.0 2785366.0 1.95 22.99 3.50 0.00
254 wi Western Sahara 266000.0 266000.0 0.0 570866.0 2.82 30.24 8.34 NaN

232 rows × 10 columns

In [7]:
#dropping countries with more than two billion people
world_pop_data = world_pop_data[world_pop_data["population"] < 2000000000]

#confirming the new number of rows
world_pop_data.shape[0]
Out[7]:
232

From the population above we can confirm that 21 countries had a zero entry in the popoulation column.This may be as a result of error during data entry.

3.Data analysis

In [8]:
# finding the countries with the highest population
world_pop_data.sort_values("population",ascending = False,inplace = True)
country_pop = world_pop_data[["name","population"]].head(10)
country_pop
Out[8]:
name population
36 China 1.367485e+09
76 India 1.251696e+09
185 United States 3.213689e+08
77 Indonesia 2.559937e+08
23 Brazil 2.042598e+08
131 Pakistan 1.990858e+08
128 Nigeria 1.815621e+08
13 Bangladesh 1.689577e+08
142 Russia 1.424238e+08
84 Japan 1.269197e+08
In [9]:
# visualize the highest population
plt.bar(country_pop["name"],country_pop["population"])
plt.xticks(rotation=90)
plt.xlabel("countries")
plt.ylabel("population")
plt.title("bar_graph_for_countries_with_the_highest_population")
plt.show()

From the data above China is the most populated country in the whole world.China's population has continued to grow due to a large border and continued modernisation that has seen an increase in living standards and immigration as well as a decrease in infant mortality rates.To read more about this access the link below click here

Pitcairn Islands is the country in the world with the lowest populationof 48.The Pitcairn islands group is a British oversees Territory.It comprises the islands of pitcairn,Henderson,Ducie and Oeno.Pitcairn is the only inhabited island ,is a small volcanic outcrop situated in the South Pacific at Lattitude 25.04 south and longitude 130.06 west.Not very many people can survive in this area due to its unfavourable climate and living conditions for human survival.The island also attracts so many tourist due to its uniqueness and special features.Unfortunately not many would consider residing in this place For more information click on the link below; click here

3.2 population density

In [10]:
#Calculating population density

world_pop_data['population_density'] = world_pop_data["population"].copy()/ world_pop_data["area"].copy()

#Population more than 45 million

pop_over_45 =world_pop_data[world_pop_data["population"]>45000000]

#creating a new columnshowing the population density

pop_over_45.sort_values("population_density",inplace = True)
pop_over_45[["name","population","population_density","area_land"]]
C:\Users\HP\anaconda3\lib\site-packages\pandas\util\_decorators.py:311: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
Out[10]:
name population population_density area_land
142 Russia 1.424238e+08 8.329732 16377742.0
23 Brazil 2.042598e+08 23.986065 8358140.0
185 United States 3.213689e+08 32.703724 9161966.0
39 Congo, Democratic Republic of the 7.937514e+07 33.850722 2267048.0
37 Colombia 4.673673e+07 41.036366 1038700.0
160 South Africa 5.367556e+07 44.029205 1214470.0
78 Iran 8.182427e+07 49.644775 1531595.0
171 Tanzania 5.104588e+07 53.885656 885800.0
113 Mexico 1.217368e+08 61.972286 1943945.0
87 Kenya 4.592530e+07 79.131482 569140.0
27 Burma 5.632021e+07 83.242739 653508.0
52 Egypt 8.848740e+07 88.359275 995450.0
162 Spain 4.814613e+07 95.269078 498980.0
178 Turkey 7.941427e+07 101.350332 769632.0
60 France 6.655377e+07 103.376301 640427.0
172 Thailand 6.797640e+07 132.476623 510890.0
77 Indonesia 2.559937e+08 134.410291 1811569.0
36 China 1.367485e+09 142.491517 9326410.0
128 Nigeria 1.815621e+08 196.545081 910768.0
82 Italy 6.185512e+07 205.266875 294140.0
64 Germany 8.085441e+07 226.468980 348672.0
131 Pakistan 1.990858e+08 250.078002 770875.0
184 United Kingdom 6.408822e+07 263.077140 241930.0
191 Vietnam 9.434884e+07 284.861070 310070.0
84 Japan 1.269197e+08 335.841814 364485.0
137 Philippines 1.009984e+08 336.661253 298170.0
76 India 1.251696e+09 380.771354 2973193.0
90 Korea, South 4.911520e+07 492.531047 96920.0
13 Bangladesh 1.689577e+08 1138.069143 130170.0

Banglsdesh is the country with the highest population density of 1138 people per square kilometer of land.Bangladesh is in the lower part of Indo_Gangetic belt .One of the main reason for high population is vthat it is a very fertile region.Secondly it has one of the highest population growth rate.South Korea is second most densly populated country in the world,folloewd by India and Philippines.

map of Bangladesh

In [11]:
#calculating most densely populated country
world_pop_data.sort_values("population_density",ascending = False,inplace = True)
country_pop = world_pop_data[["name","population","population_density","area_land"]].dropna()

country_pop.head()
Out[11]:
name population population_density area_land
204 Macau 592731.0 21168.964286 28.0
116 Monaco 30535.0 15267.500000 2.0
155 Singapore 5674472.0 8141.279770 687.0
203 Hong Kong 7141106.0 6445.041516 1073.0
250 Gaza Strip 1869055.0 5191.819444 360.0

Macau is the country with the highest population density in the world with 21168 people per square kilometer of land ,followed by big gap Monaco which in turn is followed by a big gap Singapore .In general ,as we can see ,these high values are mostly related to small countries and islands with area land below the average (553017.2km2).If you check their population it is also below the average (30641,707 people)

In [12]:
grace=world_pop_data.sort_values('population_density',ascending=False)
most_dense_coun=world_pop_data.sort_values('population_density',ascending=False)

grace.describe()
Out[12]:
area area_land area_water population population_growth birth_rate death_rate migration_rate population_density
count 2.300000e+02 2.320000e+02 229.000000 2.320000e+02 230.000000 223.000000 223.000000 219.000000 230.000000
mean 5.661912e+05 5.530172e+05 19540.545852 3.064171e+07 1.189000 19.169238 7.808161 3.412283 428.456288
std 1.782348e+06 1.698552e+06 91960.199798 1.265954e+08 0.880425 9.377402 2.906321 4.407241 1889.850615
min 2.000000e+00 2.000000e+00 0.000000 4.800000e+01 0.000000 6.650000 1.530000 0.000000 0.026653
25% 2.322750e+03 2.498250e+03 0.000000 3.435062e+05 0.430000 11.575000 5.875000 0.355000 32.990473
50% 6.998650e+04 7.066000e+04 620.000000 5.219556e+06 1.040000 16.460000 7.420000 1.880000 83.892339
75% 3.532665e+05 3.700755e+05 7200.000000 1.807358e+07 1.862500 24.260000 9.440000 4.945000 208.664451
max 1.709824e+07 1.637774e+07 891163.000000 1.367485e+09 3.320000 45.450000 14.890000 22.390000 21168.964286
In [13]:
#calculating the less densely populated country
country_pop.tail()
Out[13]:
name population population_density area_land
117 Mongolia 2992908.0 1.913482 1553556.0
237 Pitcairn Islands 48.0 1.021277 47.0
231 Falkland Islands (Islas Malvinas) 3361.0 0.276103 12173.0
223 Svalbard 1872.0 0.030172 62045.0
206 Greenland 57733.0 0.026653 2166086.0
In [14]:
grace['land_water'] = grace['area_land']/grace['area_water']

more_water =grace[grace['land_water']> 1]
more_water[['land_water','population','name','area_water','area_land']]
                                        
Out[14]:
land_water population name area_water area_land
204 inf 592731.0 Macau 0.0 28.0
116 inf 30535.0 Monaco 0.0 2.0
155 68.700000 5674472.0 Singapore 10.0 687.0
203 30.657143 7141106.0 Hong Kong 35.0 1073.0
250 inf 1869055.0 Gaza Strip 0.0 360.0
... ... ... ... ... ...
237 inf 48.0 Pitcairn Islands 0.0 47.0
231 inf 3361.0 Falkland Islands (Islas Malvinas) 0.0 12173.0
223 inf 1872.0 Svalbard 0.0 62045.0
127 4222.333333 18045729.0 Niger 300.0 1266700.0
34 50.774194 11631456.0 Chad 24800.0 1259200.0

228 rows × 5 columns

3.3 Birth rates

Birth rates is one of the primary factors affecting population of given countries across the world.

In [24]:
#sorting the birth rate of the world's population
world_pop_data.sort_values("birth_rate",inplace=True)
birth_df=world_pop_data[["name","birth_rate"]].head(10)
birth_df
Out[24]:
name birth_rate
116 Monaco 6.65
213 Saint Pierre and Miquelon 7.42
84 Japan 7.93
3 Andorra 8.13
90 Korea, South 8.19
155 Singapore 8.27
157 Slovenia 8.42
195 Taiwan 8.47
64 Germany 8.47
148 San Marino 8.63
In [25]:
# visualize countries birth rate
plt.bar(birth_df["name"],birth_df["birth_rate"])
plt.xticks(rotation=90)
plt.xlabel("countries")
plt.ylabel("birth_rate")
plt.title("bar graph of countries with the lowest birth rate")
plt.show()

Monaco is the country in the world with the lowest population growth .This is also because Monaco large majority of the population is urbanely centered and access to primary health care and education is available to children .Low birth rate is also observed in Asian countries like Japan ,South Korea,Singapore,Taiwan of upto 8 births per 1000 persons.

In [27]:
#determining the country with the highest birth rate
world_pop_data.sort_values("birth_rate",ascending=False,inplace=True)
birth_top=world_pop_data[["name","population","birth_rate"]].head(10)
birth_top
Out[27]:
name population birth_rate
127 Niger 18045729.0 45.45
108 Mali 16955536.0 44.99
181 Uganda 37101745.0 43.79
193 Zambia 15066266.0 42.13
26 Burkina Faso 18931686.0 42.03
28 Burundi 10742276.0 42.01
105 Malawi 17964697.0 41.56
159 Somalia 10616380.0 40.45
4 Angola 19625353.0 38.78
120 Mozambique 25303113.0 38.58

In [28]:
# visualize countries birth rate
plt.bar(birth_top["name"],birth_top["birth_rate"])
plt.xticks(rotation=90)
plt.xlabel("countries")
plt.ylabel("birth_rate")
plt.title("bar graph of countries with the highest birth rate")
plt.show()

Niger is the country with the high birth rate of upto 45 births per 1000 persons.Most interesting part is that among the top 20 countries with high birth rates 19 are in Africa.The main cause of high birth rates in African countries is high fertility which is driven by multiple factors such as high desired family size ,low levels of use of modern contraceptives and high levels of adolescent child bearing.Other than African countries high birth rate is also seen in Afghanistan with the rate of 39 births per 1000 persons which is due to complicated pregnancies,inaccessibilty to primary health care services ,insufficient number of health workers,early marriages ,insecurities,poverty and unemployment.The high birth rate in Afghastan is counted as major factor in children and maternal mortality rate in Afghanistan.

3.3 Death rates

In [29]:
#determinig the country with the highest death rates
world_pop_data.sort_values("death_rate",ascending=False,inplace=True)

death_df=world_pop_data[["name","population","death_rate"]].head(10)
death_df
Out[29]:
name population death_rate
97 Lesotho 1947701.0 14.89
182 Ukraine 44429471.0 14.46
25 Bulgaria 7186893.0 14.44
70 Guinea-Bissau 1726170.0 14.33
95 Latvia 1986705.0 14.31
34 Chad 11631456.0 14.28
101 Lithuania 2884433.0 14.27
121 Namibia 2212307.0 13.91
0 Afghanistan 32564342.0 13.89
33 Central African Republic 5391539.0 13.80
In [30]:
# visualize countries death rate
plt.bar(death_df["name"],death_df["death_rate"])
plt.xticks(rotation=90)
plt.xlabel("countries")
plt.ylabel("death_df")
plt.title("bar graph of countries with the highest death rate")
plt.show()

Lesotho is the country with the country with high death rates of upto 14 deaths per 1000perssons every year .The high rate in Lesotho may be as a result of seversl factors including;the effect of AIDS epidermic in Lesotho,tendancy of parents to underreport child deaths . High death rates is also reported in Ukraine (A factor contributing to the relatively high death rate is a high mortality rate among working_age matesfrom preventable cuases such as alcohol poisoning and smoking).here is a picture of Lesotho For more information click here

In [31]:
#country with the lowest death rate
world_pop_data.sort_values("death_rate",inplace=True)
death_low=world_pop_data[["name","population","death_rate"]].head(20)
death_low
Out[31]:
name population death_rate
140 Qatar 2194817.0 1.53
183 United Arab Emirates 5779760.0 1.97
92 Kuwait 2788534.0 2.18
12 Bahrain 1346613.0 2.69
250 Gaza Strip 1869055.0 3.04
240 Turks and Caicos Islands 50280.0 3.10
150 Saudi Arabia 27752316.0 3.33
130 Oman 3286936.0 3.36
155 Singapore 5674472.0 3.43
253 West Bank 2785366.0 3.50
24 Brunei 429646.0 3.52
99 Libya 6411776.0 3.58
244 Northern Mariana Islands 52344.0 3.71
79 Iraq 37056169.0 3.77
85 Jordan 8117564.0 3.79
158 Solomon Islands 622469.0 3.85
107 Maldives 393253.0 3.89
169 Syria 17064854.0 4.00
188 Vanuatu 272264.0 4.09
110 Marshall Islands 72191.0 4.21
In [32]:
# visualize countries death rate
plt.bar(death_low["name"],death_low["death_rate"])
plt.xticks(rotation=90)
plt.xlabel("countries")
plt.ylabel("death_df")
plt.title("bar graph of countries with the lowest death rate")
plt.show()

Qatar is the country in the world with low death rates of upto 2 deaths per 1000 persons every year .The lowest death rates are mostly related to the Middle East countries which are historically categorized by a very high standard of living.For more information click here

Conclusion

In this project we have analysed various demographic and geographic statistics for all countries in the world.Below are our findings:

  • 1.The countries with the largest population are China,India ,The United states and Indonesia
  • 2.the least populated countries are some islands (Pitcairn,Niue,Tokelau,Cocos Islands) and Vatican.
  • 3.The highest population density is observed in Macau,Monaco,Singapore and Hong Kong .
  • 4.The lowest population density :Greenland,Mongolia ,Namibia and Australia ### .Population dynamics
  • African countries show the highest both birth and death rates .However,the birth rates are much higher ,which results in the highest values of the natural increase and popultion growth in the world.
    • In Western Europe ,the following countries demonstrate low birth rates:Monaco,Andorra,Germany,Greece and Italy.It leads to a negative natural increase and as a result to a low population growth.However in Germany and Italy thenegative natural increase seems to be compensated by the immigration,hence the population growth for them is not among the lowest ones.
    • .Japan ,Monaco and South Korea have ones of the lowest birth rates in the world leading to a negative natural increase.
    • Lesotho has one of the highest death rates leading to a negative natural increase and low population growth.
    • Area:The biggest are Russia ,Canada,United States ,China and Brazil.The smallest ones :Vatican and Monaco.
    • Water area: The countries categorized by the highest water-to-land ratio ,considering only enclosed fresh waters:Malawi,Netherlands and Uganda.
In [ ]: