The World Factbook provides basic intelligence on the history, people, government, economy, energy, geography, communications, transportation, military, terrorism, and transnational issues for 267 world entities.
Tasks we will complete in this excercise:
Make sure that factbook.db is in the same folder as jupyter notebook
%%capture
%load_ext sql
%sql sqlite:///factbook.db
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
* sqlite:///factbook.db Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
There are two different tables in the factbook.db:
Countries related data is stored in "facts" table, so we will explore that table further.
%%sql
SELECT *
FROM facts
LIMIT 5;
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Here are the descriptions for some of the columns:
Min,Max of "population" and "population growth"
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;
* sqlite:///factbook.db Done.
MIN(population) | MAX(population) | MIN(population_growth) | MAX(population_growth) |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
From this search we can see that minimum population of a country is 0 and maximum is 7.2 billion. Both of these populations seem unrealistic. Let's find out to which country they refer to.
Minimum Population:
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MIN(population) FROM facts)
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
250 | ay | Antarctica | None | 280000 | None | 0 | None | None | None | None |
Maximum Population:
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MAX(population) FROM facts)
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
261 | xx | World | None | None | None | 7256490011 | 1.08 | 18.6 | 7.8 | None |
So, minimum population of 0 belongs to Antarctica and 7.2 billion is the world population. Going forward, we will exclude world population from our calculation for accurate result.
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts
WHERE population <> (SELECT MAX(population) FROM facts)
* sqlite:///factbook.db Done.
MIN(population) | MAX(population) | MIN(population_growth) | MAX(population_growth) |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
%%sql
SELECT *
FROM facts
WHERE population > (SELECT AVG(population) FROM facts WHERE population <> (SELECT MAX(population) FROM facts))
AND area < (SELECT AVG(area) FROM facts WHERE population <> (SELECT MAX(population) FROM facts))
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
14 | bg | Bangladesh | 148460 | 130170 | 18290 | 168957745 | 1.6 | 21.14 | 5.61 | 0.46 |
65 | gm | Germany | 357022 | 348672 | 8350 | 80854408 | 0.17 | 8.47 | 11.42 | 1.24 |
80 | iz | Iraq | 438317 | 437367 | 950 | 37056169 | 2.93 | 31.45 | 3.77 | 1.62 |
83 | it | Italy | 301340 | 294140 | 7200 | 61855120 | 0.27 | 8.74 | 10.19 | 4.1 |
85 | ja | Japan | 377915 | 364485 | 13430 | 126919659 | 0.16 | 7.93 | 9.51 | 0.0 |
88 | ke | Kenya | 580367 | 569140 | 11227 | 45925301 | 1.93 | 26.4 | 6.89 | 0.22 |
91 | ks | Korea, South | 99720 | 96920 | 2800 | 49115196 | 0.14 | 8.19 | 6.75 | 0.0 |
120 | mo | Morocco | 446550 | 446300 | 250 | 33322699 | 1.0 | 18.2 | 4.81 | 3.36 |
138 | rp | Philippines | 300000 | 298170 | 1830 | 100998376 | 1.61 | 24.27 | 6.11 | 2.09 |
139 | pl | Poland | 312685 | 304255 | 8430 | 38562189 | 0.09 | 9.74 | 10.19 | 0.46 |
163 | sp | Spain | 505370 | 498980 | 6390 | 48146134 | 0.89 | 9.64 | 9.04 | 8.31 |
173 | th | Thailand | 513120 | 510890 | 2230 | 67976405 | 0.34 | 11.19 | 7.8 | 0.0 |
182 | ug | Uganda | 241038 | 197100 | 43938 | 37101745 | 3.24 | 43.79 | 10.69 | 0.74 |
185 | uk | United Kingdom | 243610 | 241930 | 1680 | 64088222 | 0.54 | 12.17 | 9.35 | 2.54 |
192 | vm | Vietnam | 331210 | 310070 | 21140 | 94348835 | 0.97 | 15.96 | 5.93 | 0.3 |
%%sql
SELECT name, MAX(population)
FROM facts
WHERE population <> (SELECT MAX(population) FROM facts)
* sqlite:///factbook.db Done.
name | MAX(population) |
---|---|
China | 1367485388 |
%%sql
SELECT name, MAX(population_growth)
FROM facts
WHERE population <> (SELECT MAX(population) FROM facts)
* sqlite:///factbook.db Done.
name | MAX(population_growth) |
---|---|
South Sudan | 4.02 |
%%sql
SELECT name, MAX(area_water/area_land)
FROM facts
WHERE population <> (SELECT MAX(population) FROM facts)
* sqlite:///factbook.db Done.
name | MAX(area_water/area_land) |
---|---|
Virgin Islands | 4 |
%%sql
SELECT name, (area_water-area_land) AS more_water
FROM facts
WHERE more_water > 0
* sqlite:///factbook.db Done.
name | more_water |
---|---|
British Indian Ocean Territory | 54280 |
Virgin Islands | 1218 |
%%sql
SELECT name, (population * population_growth) AS new_people
FROM facts
WHERE population <> (SELECT MAX(population) FROM facts)
ORDER BY new_people DESC
LIMIT 5;
* sqlite:///factbook.db Done.
name | new_people |
---|---|
India | 1527068612.48 |
China | 615368424.6 |
Nigeria | 444827037.20000005 |
Pakistan | 290665336.62 |
Ethiopia | 287456216.91 |
%%sql
SELECT name, ROUND(birth_rate-death_rate,1) AS more_birth
FROM facts
WHERE more_birth > 0
ORDER BY more_birth DESC
LIMIT 5;
* sqlite:///factbook.db Done.
name | more_birth |
---|---|
Malawi | 33.2 |
Uganda | 33.1 |
Niger | 33.0 |
Burundi | 32.7 |
Mali | 32.1 |
%%sql
SELECT name, (CAST(population AS Float)/CAST(area AS Float)) AS pop_area_ratio
FROM facts
ORDER BY pop_area_ratio
LIMIT 5;
* sqlite:///factbook.db Done.
name | pop_area_ratio |
---|---|
Chad | None |
Niger | None |
Holy See (Vatican City) | None |
Ashmore and Cartier Islands | None |
Coral Sea Islands | None |
This wasn't the result we were expecting here. So let's explore the dataset for these countries and see what information is present in the population and area columns for these countries
%%sql
SELECT name, population, area
FROM facts
WHERE name IN ('Chad', 'Niger', 'Holy See (Vatican City)', 'Ashmore and Cartier Islands', 'Coral Sea Islands')
* sqlite:///factbook.db Done.
name | population | area |
---|---|---|
Chad | 11631456 | None |
Niger | 18045729 | None |
Holy See (Vatican City) | 842 | 0 |
Ashmore and Cartier Islands | None | 5 |
Coral Sea Islands | None | 3 |
We can see that population and area columns have "None" or "0" values for these countries. So, when calculating the population/area ratio we will have to ignore these countries.
%%sql
SELECT name, ROUND(CAST(population AS Float)/CAST(area AS Float),2) AS pop_area_ratio
FROM facts
WHERE population NOT IN ('None', '0') AND area NOT IN ('None', '0')
ORDER BY pop_area_ratio DESC
LIMIT 5;
* sqlite:///factbook.db Done.
name | pop_area_ratio |
---|---|
Macau | 21168.96 |
Monaco | 15267.5 |
Singapore | 8141.28 |
Hong Kong | 6445.04 |
Gaza Strip | 5191.82 |
Let's calculate the pop_area_ratio for the countries, which have population above average and area below average.
%%sql
SELECT name, ROUND(CAST(population AS Float)/CAST(area AS Float),2) AS pop_area_ratio
FROM facts
WHERE population NOT IN ('None', '0') AND area NOT IN ('None', '0') AND population > (SELECT AVG(population) FROM facts WHERE population <> (SELECT MAX(population) FROM facts))
AND area < (SELECT AVG(area) FROM facts WHERE population <> (SELECT MAX(population) FROM facts))
ORDER BY pop_area_ratio DESC;
* sqlite:///factbook.db Done.
name | pop_area_ratio |
---|---|
Bangladesh | 1138.07 |
Korea, South | 492.53 |
Philippines | 336.66 |
Japan | 335.84 |
Vietnam | 284.86 |
United Kingdom | 263.08 |
Germany | 226.47 |
Italy | 205.27 |
Uganda | 153.92 |
Thailand | 132.48 |
Poland | 123.33 |
Spain | 95.27 |
Iraq | 84.54 |
Kenya | 79.13 |
Morocco | 74.62 |