Analyzing CIA Factbook Data Using SQL

The Data

For this project, we'll analyze data from CIA World Factbook, a website that provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues.

Connecting the notebook to the database file:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
Out[1]:
'Connected: [email protected]'

To return information about the tables in the database:

In [2]:
%%sql
SELECT *
    FROM sqlite_master
   WHERE type='table';
Done.
Out[2]:
type name tbl_name rootpage sql
table sqlite_sequence sqlite_sequence 3 CREATE TABLE sqlite_sequence(name,seq)
table facts facts 47 CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float)

We'll analyze the facts table, which has some of the following columns:

  • name: the name of the country.
  • area: the country's total area (both land and water).
  • area_land: the country's land area in square kilometers.
  • area_water: the country's waterarea in square kilometers.
  • population: the country's population.
  • population_growth: the country's population growth as a percentage.
  • birth_rate: the country's birth rate, or the number of births per year per 1,000 people.
  • death_rate: the country's death rate, or the number of death per year per 1,000 people.

The first 5 rows from the facts table:

In [3]:
%%sql
SELECT *
    FROM facts
    LIMIT 5;
Done.
Out[3]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
1 af Afghanistan 652230 652230 0 32564342 2.32 38.57 13.89 1.51
2 al Albania 28748 27398 1350 3029278 0.3 12.92 6.58 3.3
3 ag Algeria 2381741 2381741 0 39542166 1.84 23.67 4.31 0.92
4 an Andorra 468 468 0 85580 0.12 8.13 6.96 0.0
5 ao Angola 1246700 1246700 0 19625353 2.78 38.78 11.49 0.46

The first few rows of the facts table give some geographical information of countries; like area_land, population, population growth,migration rate, etc.

Summary Statistics

Here, we'll draw some simple descriptions from the data, like:

  • maximum and minimum population and population growth
  • countries with the highest birth, death and migration rates
  • country with the largest water area in square kilometers
In [4]:
%%sql
SELECT MIN(population) AS  
    FROM facts;
(sqlite3.OperationalError) near "FROM": syntax error
[SQL: SELECT MIN(population) AS  
    FROM facts;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

To establish the countries with the least and highest population and population growths:

In [5]:
%%sql
SELECT name AS Country_with_smallest_population,
        MIN(population) AS population
    FROM facts;
Done.
Out[5]:
Country_with_smallest_population population
Antarctica 0
In [6]:
%%sql
SELECT name AS Country_with_largest_population,
         MAX(population) AS population
    FROM facts;
Done.
Out[6]:
Country_with_largest_population population
World 7256490011

To view the records of 'World' from the table:

In [7]:
%%sql
SELECT *
    FROM facts
    WHERE name = "World";
Done.
Out[7]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
261 xx World None None None 7256490011 1.08 18.6 7.8 None

This record gives the information of the whole world, and since 'World' in itself is not a country, we'll exclude the record from the preceeding analyses

Finishing off by running a single query that returns the minimum and maximum population and population growth(excluding world):

In [8]:
%%sql
SELECT MIN(population) AS Smallest_Population, 
       MAX(population) AS Largest_Population, 
       MIN(population_growth) AS Least_Population_growth,
        MAX(population_growth) AS Highest_Population_growth
    FROM facts
    WHERE name NOT IN ("World");
Done.
Out[8]:
Smallest_Population Largest_Population Least_Population_growth Highest_Population_growth
0 1367485388 0.0 4.02

Country with the largest population

To view the country with the largest population from the table:

In [9]:
%%sql
SELECT name AS Country, 
        MAX(population) AS Population,
        population_growth
    FROM facts
    WHERE name NOT IN ('World');
Done.
Out[9]:
Country Population population_growth
China 1367485388 0.45

China has the largest population but also has a shockingly low population growth.

China's high population started after the World War II; fmilies were encouraged to have as many children as possible in hopes of bringing more money to the country, building a better army, and producing more food. However, the one-child policy was later instituted to minimise the population.

Country with highest population growth

In [10]:
%%sql
SELECT name AS Country,
        population,
        MAX(population_growth) AS Population_growth
    FROM facts;
    
Done.
Out[10]:
Country population Population_growth
South Sudan 12042910 4.02

Countries with the smallest population growth

In [11]:
%%sql
SELECT name AS Country, 
        population, population_growth
    FROM facts
    WHERE population_growth =  0.0
    ORDER BY population DESC;
Done.
Out[11]:
Country population population_growth
Greenland 57733 0.0
Holy See (Vatican City) 842 0.0
Cocos (Keeling) Islands 596 0.0
Pitcairn Islands 48 0.0

Analyzing birth, death and migration rates

Birth Rate

We'll first get the average birth rate of all the countries, rounded off to 4 decimal places.

In [12]:
%%sql
SELECT ROUND(AVG(birth_rate), 4) AS Average_Birth_Rate
    FROM facts;
Done.
Out[12]:
Average_Birth_Rate
19.3286

Country with the highest birth rate

In [13]:
%%sql
SELECT name AS Country, MAX(birth_rate) AS Birth_rate
    FROM facts;
    
Done.
Out[13]:
Country Birth_rate
Niger 45.45

Niger has a very high birth rate, which is even twice higher than the average birth rate of all the countries.

This high birth rate is attributed to by the people's desire to have large families, and partly by gender inequality which includes lack of educational opportunities for women, leading to early marriages and childbirth.

This information can be found here

Country with the lowest birth rate

In [14]:
%%sql
SELECT name AS Country, MIN(birth_rate) AS Birth_Rate
    FROM facts;
Done.
Out[14]:
Country Birth_Rate
Monaco 6.65

Death Rate

The minimum , maximum and average death rates of the countries rounded off to 4 decimal places:

In [15]:
%%sql
SELECT MIN(death_rate) AS Minimum_death_rate,
       MAX(death_rate) AS Maximum_death_rate,
       ROUND(AVG(death_rate), 4) AS Average_death_rate
    FROM facts;
        
Done.
Out[15]:
Minimum_death_rate Maximum_death_rate Average_death_rate
1.53 14.89 7.8213

Migration Rate

Here, we'll find the country with the highest migration rate

In [16]:
%%sql
SELECT name AS Country, MAX(migration_rate) AS Migration_rate
    FROM facts;
Done.
Out[16]:
Country Migration_rate
Qatar 22.39

Most of Qatar's immigrants are people in search of employment.

Over 80% of Qatar's workforce is made up of immigrants.

Country with the most area under water

In [17]:
%%sql
SELECT name Country, MAX(area_water) AS area_water
    FROM facts;
Done.
Out[17]:
Country area_water
Canada 891163

Average value for population and area

In [18]:
%%sql
SELECT AVG(population), AVG(area)
    FROM facts;
Done.
Out[18]:
AVG(population) AVG(area)
62094928.32231405 555093.546184739

Population Density

To identify countries that are densely populated, we'll check the countries that have population above the average and area below the average:

In [19]:
%%sql
SELECT name AS Country
    FROM facts
    WHERE population > 62094928.32231405 AND area < 555093.546184739;
Done.
Out[19]:
Country
Bangladesh
Germany
Japan
Philippines
Thailand
United Kingdom
Vietnam

Country with population density

To obtain the population density of a country, we divide the population by area.

We'll therefore check the top 10 countries with the highest population density.

In [20]:
%%sql
SELECT name AS country, population/area AS population_density
    FROM facts
    WHERE name NOT IN ('World')
    ORDER BY population/area DESC
    LIMIT 10;
Done.
Out[20]:
country population_density
Macau 21168
Monaco 15267
Singapore 8141
Hong Kong 6445
Gaza Strip 5191
Gibraltar 4876
Bahrain 1771
Maldives 1319
Malta 1310
Bermuda 1299

Conclusion

SQL makes it easy to interact with databases.

We've been able to draw some useful insights from the facts table, like the maximum and minimum population, population growth and birth rate, among others.

We've also managed to perform some simple calculations like getting the population densities of the countries and several averages with ease.