Analyzing CIA Factbook Data Using SQL

Introduction

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:

  • population - The population as of 2015.
  • population_growth - The annual population growth rate, as a percentage.
  • area - The total land and water area.

In this project, we'll use SQL in Jupyter Notebook to explore and analyze data from this database. The SQLite factbook.db database can be downloaded from the repository.

We'll use the following code to connect our Jupyter Notebook to our database file:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
Out[1]:
'Connected: [email protected]'

To run SQL queries in this project we add %%sql on its own line to the start of our query.

Query to return information on the tables in the database

In [2]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';
Done.
Out[2]:
type name tbl_name rootpage sql
table sqlite_sequence sqlite_sequence 3 CREATE TABLE sqlite_sequence(name,seq)
table facts facts 47 CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float)

Executing the first 5 rows of the facts table in the database

In [4]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;
Done.
Out[4]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
1 af Afghanistan 652230 652230 0 32564342 2.32 38.57 13.89 1.51
2 al Albania 28748 27398 1350 3029278 0.3 12.92 6.58 3.3
3 ag Algeria 2381741 2381741 0 39542166 1.84 23.67 4.31 0.92
4 an Andorra 468 468 0 85580 0.12 8.13 6.96 0.0
5 ao Angola 1246700 1246700 0 19625353 2.78 38.78 11.49 0.46

Here are the descriptions for some of the columns:

  • name - The name of the country.
  • area- The country's total area (both land and water).
  • area_land - The country's land area in square kilometers.
  • area_water - The country's waterarea in square kilometers.
  • population - The country's population.
  • population_growth- The country's population growth as a percentage.
  • birth_rate - The country's birth rate, or the number of births a year per 1,000 people.
  • death_rate - The country's death rate, or the number of death a year per 1,000 people.

Let's start by calculating some summary statistics and look for any outlier countries.

Summary statistics to identify outlier countries

In [5]:
%%sql
SELECT MIN(population) min_pop, MAX(population) max_pop, 
       MIN(population_growth) min_pop_growth,
       MAX(population_growth) max_pop_growth
    FROM facts;
Done.
Out[5]:
min_pop max_pop min_pop_growth max_pop_growth
0 7256490011 0.0 4.02

A few things stick out from the summary statistics:

  • There's a country with a population of 0
  • There's a country with a population of 7256490011 (or more than 7.2 billion people)

Using subqueries to zoom in on just these countries without using the specific values

Executing a query that returns the countrie(s) with the minimum population.

In [6]:
%%sql
SELECT *
  FROM facts
 WHERE population IN (SELECT MIN(population)
                        FROM facts
                     );
Done.
Out[6]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
250 ay Antarctica None 280000 None 0 None None None None

Executing a query that returns the countrie(s) with the maximum population.

In [7]:
%%sql
SELECT *
  FROM facts
 WHERE population IN (SELECT MAX(population)
                        FROM facts
                     );
Done.
Out[7]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
261 xx World None None None 7256490011 1.08 18.6 7.8 None

It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica:

Now that we know this, we should recalculate the summary statistics we calculated earlier, while excluding the row for the whole world.

Recomputing the summary statistics we found earlier while excluding the row for the whole world

In [8]:
%%sql
SELECT MIN(population) min_pop, MAX(population) max_pop, 
       MIN(population_growth) min_pop_growth,
       MAX(population_growth) max_pop_growth
    FROM facts
 WHERE name <> 'World';
Done.
Out[8]:
min_pop max_pop min_pop_growth max_pop_growth
0 1367485388 0.0 4.02

From the result summary, we found out that a country maximum popuplation approaches 1.4 billion.

Exploring the average values for population and area

In [9]:
%%sql
SELECT AVG(population) pop_avg, AVG(area) area_avg
  FROM facts
WHERE name <> 'World';    
Done.
Out[9]:
pop_avg area_avg
32242666.56846473 555093.546184739

From the table above, we could see that the average population is approximately 32 million and average area is also 555 thousand square kilometers.

Identifying countries that are densely populated

We'll identify countries that have:

  • Above average values for population.
  • Below average values for area.
In [12]:
%%sql
SELECT *
   FROM facts
WHERE population > (SELECT (AVG(population))
                        FROM facts
                     WHERE name <> 'World')
AND

    area < (SELECT (AVG(area))
                        FROM facts
                     WHERE name <> 'World');
Done.
Out[12]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
14 bg Bangladesh 148460 130170 18290 168957745 1.6 21.14 5.61 0.46
65 gm Germany 357022 348672 8350 80854408 0.17 8.47 11.42 1.24
80 iz Iraq 438317 437367 950 37056169 2.93 31.45 3.77 1.62
83 it Italy 301340 294140 7200 61855120 0.27 8.74 10.19 4.1
85 ja Japan 377915 364485 13430 126919659 0.16 7.93 9.51 0.0
91 ks Korea, South 99720 96920 2800 49115196 0.14 8.19 6.75 0.0
120 mo Morocco 446550 446300 250 33322699 1.0 18.2 4.81 3.36
138 rp Philippines 300000 298170 1830 100998376 1.61 24.27 6.11 2.09
139 pl Poland 312685 304255 8430 38562189 0.09 9.74 10.19 0.46
163 sp Spain 505370 498980 6390 48146134 0.89 9.64 9.04 8.31
173 th Thailand 513120 510890 2230 67976405 0.34 11.19 7.8 0.0
182 ug Uganda 241038 197100 43938 37101745 3.24 43.79 10.69 0.74
185 uk United Kingdom 243610 241930 1680 64088222 0.54 12.17 9.35 2.54
192 vm Vietnam 331210 310070 21140 94348835 0.97 15.96 5.93 0.3

The table above depicts all the countries that met both conditions for population above average and area below average.

Exploring the country with most people and highest growth rate

In [13]:
%%sql
SELECT name, 
        MAX(population),
        MAX(population_growth)
   FROM facts;
Done.
Out[13]:
name MAX(population) MAX(population_growth)
South Sudan 7256490011 4.02

The result shows Sudan to be the country with most people as well as highest growth rate.

Exploring countries with the highest ratios of water to land

In [25]:
%%sql
SELECT name,
CAST(area_water AS Float)/CAST(area_land AS Float) 'water:land'
   FROM facts;
Done.
Out[25]:
name water:land
Afghanistan 0.0
Albania 0.04927366961092051
Algeria 0.0
Andorra 0.0
Angola 0.0
Antigua and Barbuda 0.0
Argentina 0.015971849204696183
Armenia 0.054604120129064286
Australia 0.007669578121135597
Austria 0.017296379404451452
Azerbaijan 0.048058187803313605
Bahamas, The 0.3866133866133866
Bahrain 0.0
Bangladesh 0.1405085657217485
Barbados 0.0
Belarus 0.023164120256283883
Belgium 0.008256820133430213
Belize 0.00701569762343243
Benin 0.018079586339064563
Bhutan 0.0
Bolivia 0.014105036365700761
Bosnia and Herzegovina 0.0001953621036591322
Botswana 0.026467630088401885
Brazil 0.018859459161966657
Brunei 0.0949667616334283
Bulgaria 0.022029883213966393
Burkina Faso 0.0014609203798392988
Burma 0.03530178666519767
Burundi 0.0837227414330218
Cambodia 0.025606888932951873
Cameroon 0.005775211017325633
Canada 0.09799992456155804
Cabo Verde 0.0
Central African Republic 0.0
Chad 0.019695044472681066
Chile 0.01652299236903949
China 0.029009018475490568
Colombia 0.09647636468662751
Comoros 0.0
Congo, Democratic Republic of the 0.03432216697661452
Congo, Republic of the 0.0014641288433382138
Costa Rica 0.0007833920877399138
Cote d'Ivoire 0.014025024921148543
Croatia 0.011076571265230286
Cuba 0.00947004188672373
Cyprus 0.0010821339681852614
Czech Republic 0.02097168822090178
Denmark 0.015553565537069332
Djibouti 0.0008628127696289905
Dominica 0.0
Dominican Republic 0.007243377483443708
Ecuador 0.02427386116940771
Egypt 0.006027424782761565
El Salvador 0.015443270112446311
Equatorial Guinea 0.0
Eritrea 0.16435643564356436
Estonia 0.06700009436633009
Ethiopia None
Fiji 0.0
Finland 0.11299639583299047
France 0.005268360016051791
Gabon 0.03880978161735884
Gambia, The 0.116600790513834
Georgia 0.0
Germany 0.023948008443465493
Ghana 0.048344635723169826
Greece 0.010027019372813766
Grenada 0.0
Guatemala 0.016144234268703515
Guinea 0.0005697611479873187
Guinea-Bissau 0.2846728307254623
Guyana 0.09205025171578214
Haiti 0.006894049346879536
Honduras 0.00178746983644651
Hungary 0.038166235157575217
Iceland 0.02743142144638404
India 0.10563390940312317
Indonesia 0.0513367141963679
Iran 0.07612978626856316
Iraq 0.0021720888864500522
Ireland 0.02017914434621024
Israel 0.02164289227742253
Italy 0.024478139661385734
Jamaica 0.01477241251961961
Japan 0.036846509458551106
Jordan 0.006080944122880115
Kazakhstan 0.009334370485609512
Kenya 0.019726253645851635
Kiribati 0.0
Korea, North 0.0010796624808982792
Korea, South 0.028889806025588115
Kosovo 0.0
Kuwait 0.0
Kyrgyzstan 0.04249195781043894
Laos 0.025996533795493933
Latvia 0.03759096531671192
Lebanon 0.016617790811339198
Lesotho 0.0
Liberia 0.15623961794019933
Libya 0.0
Liechtenstein 0.0
Lithuania 0.041799617102744095
Luxembourg 0.0
Macedonia 0.011009318601816538
Madagascar 0.009459366509612408
Malawi 0.25939625850340137
Malaysia 0.0036207961491768015
Maldives 0.0
Mali 0.016392529032363812
Malta 0.0
Marshall Islands 0.0
Mauritania 0.0
Mauritius 0.0049261083743842365
Mexico 0.010509556597537481
Micronesia, Federated States of 0.0
Moldova 0.029187315679061143
Monaco 0.0
Mongolia 0.006797308883619258
Montenegro 0.026761819803746655
Morocco 0.000560161326462021
Mozambique 0.016531447900506115
Namibia 0.001217068104799038
Nauru 0.0
Nepal 0.02671763712844696
Netherlands 0.22571032366565366
New Zealand None
Nicaragua 0.08650720893407784
Niger 0.00023683587274019106
Nigeria 0.014273667937389104
Norway 0.06415101780585115
Oman 0.0
Pakistan 0.03271606940165397
Palau 0.0
Panama 0.014527845036319613
Papua New Guinea 0.022037715850373184
Paraguay 0.023785432743857318
Peru 0.004078137744180451
Philippines 0.006137438374081899
Poland 0.027707022070302872
Portugal 0.006778178637804745
Qatar 0.0
Romania 0.03697404422095689
Russia 0.04399263341674328
Rwanda 0.06769904329495703
Saint Kitts and Nevis 0.0
Saint Lucia 0.0165016501650165
Saint Vincent and the Grenadines 0.0
Samoa 0.003544842254519674
San Marino 0.0
Sao Tome and Principe 0.0
Saudi Arabia 0.0
Senegal 0.02177323014595128
Serbia 0.0
Seychelles 0.0
Sierra Leone 0.0016755096341803965
Singapore 0.01455604075691412
Slovakia 0.01933270969753664
Slovenia 0.006054290109671977
Solomon Islands 0.03251625812906453
Somalia 0.016450488334021428
South Africa 0.0038041285499024265
South Sudan None
Spain 0.012806124493967693
Sri Lanka 0.01516323688689463
Sudan None
Suriname 0.05012820512820513
Swaziland 0.009300162752848175
Sweden 0.09738384490720996
Switzerland 0.032002400180013504
Syria 0.008440886565376027
Tajikistan 0.018302593456292843
Tanzania 0.06942876495822985
Thailand 0.004364931785707295
Timor-Leste 0.0
Togo 0.044129815206398824
Tonga 0.04184100418410042
Trinidad and Tobago 0.0
Tunisia 0.05310247167868177
Turkey 0.018099559269884827
Turkmenistan 0.038665333134722195
Tuvalu 0.0
Uganda 0.22292237442922375
Ukraine 0.04180691488443547
United Arab Emirates 0.0
United Kingdom 0.006944157400901087
United States 0.07255091319919764
Uruguay 0.006856555152415508
Uzbekistan 0.051716031969910675
Vanuatu 0.0
Holy See (Vatican City) None
Venezuela 0.034011677342554274
Vietnam 0.06817815332021801
Yemen 0.0
Zambia 0.012402508481324943
Zimbabwe 0.010107355104214328
Taiwan 0.1153130812151271
European Union None
Ashmore and Cartier Islands 0.0
Christmas Island 0.0
Cocos (Keeling) Islands 0.0
Coral Sea Islands 0.0
Heard Island and McDonald Islands 0.0
Norfolk Island 0.0
Hong Kong 0.032618825722273995
Macau 0.0
Faroe Islands 0.0
Greenland None
Clipperton Island 0.0
French Polynesia 0.08884243532793311
French Southern and Antarctic Lands None
New Caledonia 0.016415868673050615
Saint Barthelemy None
Saint Martin None
Saint Pierre and Miquelon 0.0
Wallis and Futuna 0.0
Aruba 0.0
Curacao 0.0
Sint Maarten 0.0
Cook Islands 0.0
Niue 0.0
Tokelau 0.0
Bouvet Island 0.0
Jan Mayen 0.0
Svalbard 0.0
Akrotiri None
Anguilla 0.0
Bermuda 0.0
British Indian Ocean Territory 905.6666666666666
British Virgin Islands 0.0
Cayman Islands 0.0
Dhekelia None
Falkland Islands (Islas Malvinas) 0.0
Gibraltar 0.0
Guernsey 0.0
Jersey 0.0
Isle of Man 0.0
Montserrat 0.0
Pitcairn Islands 0.0
Saint Helena, Ascension, and Tristan da Cunha 0.0
South Georgia and South Sandwich Islands 0.0
Turks and Caicos Islands 0.0
American Samoa 0.0
Guam 0.0
Navassa Island 0.0
Northern Mariana Islands 0.0
Puerto Rico 0.5547914317925592
Virgin Islands 4.520231213872832
Wake Island 0.0
United States Pacific Island Wildlife Refuges None
Antarctica None
Gaza Strip 0.0
Paracel Islands 0.0
Spratly Islands 0.0
West Bank 0.03900709219858156
Western Sahara 0.0
Arctic Ocean None
Atlantic Ocean None
Indian Ocean None
Pacific Ocean None
Southern Ocean None
World None

Exploring countries that have more water than land

In [14]:
%%sql
SELECT name
 FROM facts
WHERE (area_water > area_land);    
Done.
Out[14]:
name
British Indian Ocean Territory
Virgin Islands

British Indian Ocean Territory and Virgin Islands are the countries with more water than land respectively.

Exploring countries with increase population growth next year

In [15]:
%%sql
SELECT name, population_growth
 FROM facts
ORDER BY population_growth DESC
LIMIT 10;    
Done.
Out[15]:
name population_growth
South Sudan 4.02
Malawi 3.32
Burundi 3.28
Niger 3.25
Uganda 3.24
Qatar 3.07
Burkina Faso 3.03
Mali 2.98
Cook Islands 2.95
Iraq 2.93

The above countries will add the most people to their population next year.

Exploring countries that have a higher death rate than birth rate

In [16]:
%%sql
SELECT name
 FROM facts
WHERE (death_rate > birth_rate);    
Done.
Out[16]:
name
Austria
Belarus
Bosnia and Herzegovina
Bulgaria
Croatia
Czech Republic
Estonia
Germany
Greece
Hungary
Italy
Japan
Latvia
Lithuania
Moldova
Monaco
Poland
Portugal
Romania
Russia
Serbia
Slovenia
Ukraine
Saint Pierre and Miquelon

Exploring countries that have the highest population/area ratio

In [17]:
%%sql
SELECT name,
CAST((population) AS Float)/CAST((area) AS Float) highest_population_area
 FROM facts
LIMIT 5;    
Done.
Out[17]:
name highest_population_area
Afghanistan 49.92769728470018
Albania 105.37352163628775
Algeria 16.602210735760103
Andorra 182.86324786324786
Angola 15.741840859870058

Conclusion

In conclusion, in this project we were able to answer most of the questions posed in the problem statement of our work with the use of SQL in Jupyter notebook.

In [ ]: