The Data
For this project, we'll analyze data from CIA World Factbook, a website that provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues.
Connecting the notebook to the database file:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
To return information about the tables in the database:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
We'll analyze the facts table, which has some of the following columns:
The first 5 rows from the facts table:
%%sql
SELECT *
FROM facts
LIMIT 5;
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
The first few rows of the facts table give some geographical information of countries; like area_land, population, population growth,migration rate, etc.
Summary Statistics
Here, we'll draw some simple descriptions from the data, like:
%%sql
SELECT MIN(population) AS
FROM facts;
(sqlite3.OperationalError) near "FROM": syntax error [SQL: SELECT MIN(population) AS FROM facts;] (Background on this error at: http://sqlalche.me/e/13/e3q8)
To establish the countries with the least and highest population and population growths:
%%sql
SELECT name AS Country_with_smallest_population,
MIN(population) AS population
FROM facts;
Done.
Country_with_smallest_population | population |
---|---|
Antarctica | 0 |
%%sql
SELECT name AS Country_with_largest_population,
MAX(population) AS population
FROM facts;
Done.
Country_with_largest_population | population |
---|---|
World | 7256490011 |
To view the records of 'World' from the table:
%%sql
SELECT *
FROM facts
WHERE name = "World";
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
261 | xx | World | None | None | None | 7256490011 | 1.08 | 18.6 | 7.8 | None |
This record gives the information of the whole world, and since 'World' in itself is not a country, we'll exclude the record from the preceeding analyses
Finishing off by running a single query that returns the minimum and maximum population and population growth(excluding world):
%%sql
SELECT MIN(population) AS Smallest_Population,
MAX(population) AS Largest_Population,
MIN(population_growth) AS Least_Population_growth,
MAX(population_growth) AS Highest_Population_growth
FROM facts
WHERE name NOT IN ("World");
Done.
Smallest_Population | Largest_Population | Least_Population_growth | Highest_Population_growth |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
*Country with the largest population*
To view the country with the largest population from the table:
%%sql
SELECT name AS Country,
MAX(population) AS Population,
population_growth
FROM facts
WHERE name NOT IN ('World');
Done.
Country | Population | population_growth |
---|---|---|
China | 1367485388 | 0.45 |
China has the largest population but also has a shockingly low population growth.
China's high population started after the World War II; fmilies were encouraged to have as many children as possible in hopes of bringing more money to the country, building a better army, and producing more food. However, the one-child policy was later instituted to minimise the population.
*Country with highest population growth*
%%sql
SELECT name AS Country,
population,
MAX(population_growth) AS Population_growth
FROM facts;
Done.
Country | population | Population_growth |
---|---|---|
South Sudan | 12042910 | 4.02 |
*Countries with the smallest population growth*
%%sql
SELECT name AS Country,
population, population_growth
FROM facts
WHERE population_growth = 0.0
ORDER BY population DESC;
Done.
Country | population | population_growth |
---|---|---|
Greenland | 57733 | 0.0 |
Holy See (Vatican City) | 842 | 0.0 |
Cocos (Keeling) Islands | 596 | 0.0 |
Pitcairn Islands | 48 | 0.0 |
Birth Rate
We'll first get the average birth rate of all the countries, rounded off to 4 decimal places.
%%sql
SELECT ROUND(AVG(birth_rate), 4) AS Average_Birth_Rate
FROM facts;
Done.
Average_Birth_Rate |
---|
19.3286 |
*Country with the highest birth rate*
%%sql
SELECT name AS Country, MAX(birth_rate) AS Birth_rate
FROM facts;
Done.
Country | Birth_rate |
---|---|
Niger | 45.45 |
Niger has a very high birth rate, which is even twice higher than the average birth rate of all the countries.
This high birth rate is attributed to by the people's desire to have large families, and partly by gender inequality which includes lack of educational opportunities for women, leading to early marriages and childbirth.
This information can be found here
*Country with the lowest birth rate*
%%sql
SELECT name AS Country, MIN(birth_rate) AS Birth_Rate
FROM facts;
Done.
Country | Birth_Rate |
---|---|
Monaco | 6.65 |
Death Rate
The minimum , maximum and average death rates of the countries rounded off to 4 decimal places:
%%sql
SELECT MIN(death_rate) AS Minimum_death_rate,
MAX(death_rate) AS Maximum_death_rate,
ROUND(AVG(death_rate), 4) AS Average_death_rate
FROM facts;
Done.
Minimum_death_rate | Maximum_death_rate | Average_death_rate |
---|---|---|
1.53 | 14.89 | 7.8213 |
Migration Rate
Here, we'll find the country with the highest migration rate
%%sql
SELECT name AS Country, MAX(migration_rate) AS Migration_rate
FROM facts;
Done.
Country | Migration_rate |
---|---|
Qatar | 22.39 |
Most of Qatar's immigrants are people in search of employment.
Over 80% of Qatar's workforce is made up of immigrants.
*Country with the most area under water*
%%sql
SELECT name Country, MAX(area_water) AS area_water
FROM facts;
Done.
Country | area_water |
---|---|
Canada | 891163 |
*Average value for population and area*
%%sql
SELECT AVG(population), AVG(area)
FROM facts;
Done.
AVG(population) | AVG(area) |
---|---|
62094928.32231405 | 555093.546184739 |
Population Density
To identify countries that are densely populated, we'll check the countries that have population above the average and area below the average:
%%sql
SELECT name AS Country
FROM facts
WHERE population > 62094928.32231405 AND area < 555093.546184739;
Done.
Country |
---|
Bangladesh |
Germany |
Japan |
Philippines |
Thailand |
United Kingdom |
Vietnam |
*Country with population density*
To obtain the population density of a country, we divide the population by area.
We'll therefore check the top 10 countries with the highest population density.
%%sql
SELECT name AS country, population/area AS population_density
FROM facts
WHERE name NOT IN ('World')
ORDER BY population/area DESC
LIMIT 10;
Done.
country | population_density |
---|---|
Macau | 21168 |
Monaco | 15267 |
Singapore | 8141 |
Hong Kong | 6445 |
Gaza Strip | 5191 |
Gibraltar | 4876 |
Bahrain | 1771 |
Maldives | 1319 |
Malta | 1310 |
Bermuda | 1299 |
SQL makes it easy to interact with databases.
We've been able to draw some useful insights from the facts table, like the maximum and minimum population, population growth and birth rate, among others.
We've also managed to perform some simple calculations like getting the population densities of the countries and several averages with ease.