#!/usr/bin/env python # coding: utf-8
# Run this if SQL is not instatlled: !conda install -yc conda-forge ipython-sql # # World Facts by the CIA # [![image.png](attachment:image.png)](https://www.cia.gov/the-world-factbook/) # - [1 Introduction](#1) # - [2 Being Human...](#2) # - [3 Have you all got a place?](#3) # - [4 Born one day, Dead on another...](#4) # - [5 Water, Water everywhere....](#5) # - [6 Conclusion](#6) # # 1 # ## Introduction # The [CIA World Factbook](https://www.cia.gov/the-world-factbook/) is a collection of statistics about all of the countries on Earth. The Factbook contains demographic information like the following: # # - _population_ - Population in various countries around the world. # - _population_growth_ - The annual growth rate in the different countries around the workd # - _area_ - The land and water area around the world. # The data has been captured in the _factbook.db_ which will be opened and analyzed using SQLite. # In[44]: get_ipython().run_cell_magic('capture', '', '%load_ext sql\n%sql sqlite:///factbook.db\n') # In[45]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM sqlite_master\n WHERE type='table';\n") # In[46]: get_ipython().run_cell_magic('sql', '', '\nSELECT *\nFROM facts\n') # [Index](#Index) # # 2 # ## Being Human... # **Analysis on human population and its spread.** # Population would be an interesting column to analyze. Given the considerable increase in world population and how it has affected the environment. It would be interesting to know which country is the most populated and which country has the most population growth rate. # To begin with we need to know what the maximum population and population growth look like. # In[47]: get_ipython().run_cell_magic('sql', '', "\nSELECT MIN(population) AS 'Least Population', \n MAX(population) AS 'Most Population', \n MIN(population_growth) AS 'Least Population Growth',\n MAX(population_growth) AS 'Most Population Growth'\nFROM facts\n") # The values of Least and Most Population seem to be an oddity. It would be interesting to find the countries that these values relate to. # In[48]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Least Populous Country'\nFROM facts\nWHERE population = 0\n") # In[49]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Most Populous Country'\nFROM facts\nWHERE population = 7256490011\n") # The oddity has been explained. The database contains a record for the population of the entire world. This record will be skipped as the analysis continues. # Let us try the same statistics after avoiding the _World_ record. # In[50]: get_ipython().run_cell_magic('sql', '', "\nSELECT MIN(population) AS 'Least Population', \n MAX(population) AS 'Most Population', \n MIN(population_growth) AS 'Least Population Growth',\n MAX(population_growth) AS 'Most Population Growth'\nFROM facts\nWHERE name<>'World'\n") # In[51]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Most Populous Country', \n population AS 'Population', \n population_growth AS 'Population Growth Rate'\nFROM facts\nWHERE population = (SELECT MAX(population)\n FROM facts\n WHERE name <> 'World')\n") # While China is the most populous country in the world. Its growth rate seems quite low. It would be interesting to find out the population of the country with the highest growth rate. # In[52]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Country with highest population growth rate', \n population AS 'Population', \n population_growth AS 'Population Growth Rate'\nFROM facts\nWHERE population_growth = (SELECT MAX(population_growth)\n FROM facts\n WHERE name <> 'World')\n") # While **China has a population that is 100 times more than South Sudan**, its population **growth rate is only a tenth of that of South Sudan**. # This raises the question of whether South Sudan might be an outlier with regards to population growth rate. An average of the population growth rate of the world should reveal a better understanding.bm # In[53]: get_ipython().run_cell_magic('sql', '', " \nSELECT ROUND(AVG(population_growth),3) AS 'Average Population Growth'\nFROM facts\nWHERE name<>'World'\n") # **South Sudan's growth rate** seems to be **higher than average**. It would be interesting to find out how many countries have above average growth rates. # In[54]: get_ipython().run_cell_magic('sql', '', "\nSELECT COUNT(*) AS 'Number of countries with more than average population growth'\nFROM facts\nWHERE population_growth > (SELECT AVG(population_growth)\n FROM facts\n WHERE name<>'World')\n") # Since there are 101 countries that have more than the average growth rate, further analysis could help to identify those countries. # In[55]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Country Name', population_growth AS 'Population Growth Rate'\nFROM facts\nWHERE population_growth>(SELECT AVG(population_growth)\n FROM facts)\nORDER BY population_growth DESC\nLIMIT 30\n") # More than a fifth of the countries in the above list are in Africa. # # Further research on the reason behind this phenomenon reveals that a significant number of countries in Africa have high growth rates because of high fertility rates. Apparently, fertility rates take a dive when there is economic growth because of many reasons including couples choosing to have smaller families. [Read more](https://www.usnews.com/news/best-countries/articles/2019-07-10/africa-to-lead-world-in-population-growth) # [Index](#Index) # # 3 # ## Have you all got a place? # **Analysis of Area vs. Population** # Earlier the CIA World data revealed a record containing a population value greater than 7 billion. This was found to be the population of the world. The question to then analyze would be is whether the countries around the world have the area to sustain the population. # # We could begin by finding out the average area and population around the world. # In[56]: get_ipython().run_cell_magic('sql', '', "\nSELECT ROUND(AVG(population),3) AS 'Average Population', \n ROUND(AVG(area),3) AS 'Average Area'\nFROM FACTS\nWHERE name!='World'\n") # Next, we can identify those countries wherein the population is greater than the average but the area held by them are less than average i.e. Densely populated countries. # In[57]: get_ipython().run_cell_magic('sql', '', "--#Densely Populated countries in descending order of population to area ratio\nSELECT name AS Country, ROUND((CAST(area AS FLOAT)/population),5) AS 'Area/Population_Ratio', population AS Population, area AS Area\nFROM facts\nWHERE (population > (SELECT AVG(population)\n FROM facts\n WHERE name!= 'World')\n AND\n area < (SELECT AVG(area)\n FROM facts\n WHERE name!='World')\n )\nORDER BY population/area DESC\n") # **Bangladesh** has been identified as the **most densely populated country** in the world. This statistic leads to another question which is, what is the average population to area ratio and how many countries have a ratio that is higher than the average? # In[58]: get_ipython().run_cell_magic('sql', '', "\nSELECT ROUND(AVG(population/area),3) AS 'Average Population/Area ratio'\nFROM FACTS\n") # In[59]: get_ipython().run_cell_magic('sql', '', "SELECT name AS Country, Population/Area AS 'Population/Area Ratio', population AS Population, \n area AS Area, \n ROUND((CAST(area AS FLOAT)/population),5) AS 'Area/Population Ratio'\nFROM facts\nWHERE population/area > (SELECT AVG(population/area)\n FROM facts)\nORDER BY population/area DESC\n") # From the earlier statistics we may have naturally assumed that Bangladesh would have been at the top of this list. # # However, there are many countries around the world that have a population to area ratio that is higher than Bangldesh. The reason why these countries did not come up is because they have a population thats much less than average world population. # [Index](#Index) # # 4 # ## Born one day, Dead on another... # **Exploring birth and death rates** # Birth rates are good indicators for population growth rates. More births mean that the population is growing. Earlier it was found that many African countries had higher birth rates. We could confirm the finding by verifying the countries that have higher birth rates than death rates. # In[60]: get_ipython().run_cell_magic('sql', '', "SELECT ROUND(AVG(birth_rate), 3) AS 'Average Birth Rate'\nFROM facts\n") # In[61]: get_ipython().run_cell_magic('sql', '', "SELECT name AS Country, \n birth_rate AS 'Birth Rate', \n death_rate AS 'Death Rate',\n ROUND(birth_rate - death_rate, 3) AS 'Birth Rate - Death Rate'\nFROM facts\nWHERE birth_rate > death_rate\nORDER BY birth_rate - death_rate DESC\nLIMIT 30\n") # The above confirms our understanding on growth rate. Many of the countries displayed are from Africa and have a birth rate that is higher than the global average. # # This leads to a need to find countries that are at the other end of the spectrum. Countries that have a death rate that is greater than their birth rate. # In[62]: get_ipython().run_cell_magic('sql', '', "SELECT name AS Country, \n birth_rate AS 'Birth Rate', \n death_rate AS 'Death Rate',\n ROUND(death_rate - birth_rate, 2) AS 'Death Rate - Birth Rate'\nFROM facts\nWHERE death_rate > birth_rate\nORDER BY death_rate-birth_rate DESC\n") # What immediately sticks out is that many of the countries displayed were part of the now defunct USSR or Soviet Union. A [study](https://www.thelancet.com/journals/lanpub/article/PIIS2468-26671730072-5/fulltext) related to this points out that one of the reasons for a high death rate could have been the rapid transition to privatisation of many industries of the former Soviet Union. # # Having been under communism, many of these countries had large workforces and nearly everyone was engaged. However with the rise in privatisation, many were left unemployed causing them to turn to alcohol and smoking. This in addition to poor health systems may have lead to higher mortality rates. # [Read more](https://www.rferl.org/a/life-expectancy-cis-report/24946030.html) # # It must be noted however, that not all countries in the former Soviet Union share the same fate. Countries like Poland and Czech Republic were differentiators. Poland had social systems in place to support the population and immigrant Poles to support financially. Whereas in the Czech Republic mass privatisation was done one enterprise at a time. [Read more](https://www.nytimes.com/2009/01/16/world/europe/16europe.html) # [Index](#Index) # # 5 # ## Water, Water everywhere.... # **Countries with more water than land** # There are countries around the world where water covers more area than the land. # In[63]: get_ipython().run_cell_magic('sql', '', '\nSELECT name, area_water, area_land\nFROM facts\nWHERE area_water > area_land\n') # Clearly the British Indian Ocean Territory has significant amount of area held by water. It would be interesting to know the top 10 countries where the ration of water to land is high. # In[64]: get_ipython().run_cell_magic('sql', '', "\nSELECT name AS 'Top 10', ROUND(CAST(area_water AS FLOAT)/area_land,3) AS 'Water Area/Land Area'\nFROM facts\nORDER BY CAST(area_water AS FLOAT)/area_land DESC\nLIMIT 10;\n") # [Index](#Index) # # 6 # ## Conclusion # The CIA World Factbook is a collection of demographic, geographical and other related facts of countries around the world. Based on the provided information we have undertaken analysis on a small set of data. The data was read and we've come up with many insights. Population growth is distributed variedly around the world. Population, however is not a good indicator of population growth. Birth rates, on the other hand are a good indicator of population growth. # # Many African countries are projected to have a larger population over the coming years. On the other side of the spectrum, many European countries especially those reeling from the Post-Soviet era are suffering from a high death rate possibly due to lack of employment, bad health services and heavy consumption of alcohol. # [Index](#Index) # # 7 # ## Acknowledgement # This project is a guided project provided by Dataquest to understand and practice fundamentals in SQL. The project was done in SQLite # [Index](#Index)