This project entails working with CIA World Factbook data statistics which can be gotten from this link https://www.cia.gov/library/publications/the-world-factbook/ and it contains the statistics of all the countries on Earth
The Factbook contains demographic information like:
population - The population as of 2015.
population_growth - The annual population growth rate, as a percentage.
area - The total land and water area.
In this project, i will use the basics of SQL to explore and analyze the data. link to get the database https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db
%%capture
%load_ext sql
%sql sqlite:///factbook.db
#connecting the jupyter notebook to the database file
%%sql
SELECT * FROM sqlite_master WHERE type='table';
We have seen the contents of our file and the data types above.
%%sql
SELECT * FROM facts limit 5;
what i did above was to easily have acess the content of my database.
We are getting close to how to approach our data
lets start calculating some summary statistics of our file
%%sql
select
min(population) min_population,
max(population) max_population,
min(population_growth) min_population_growth,
max(population_growth) max_population_growth
from facts
our maximum population has 7256490011 value which is approximately 7.2 billion people.
while our minimum population recorded 0 value, could it be the antarctica region or zone with no one living there??
we have minimum population growth as 0, while maximum population growth was 4.02
Now we are interested in gettin the countries with the minimum and maximum population. let us see further
%%sql
select *
from facts
where population == (select min(population) from facts)
Yea, our guesses where right and correct. The region with the least population is Antarctica. Now let us move on to the countries with the highest population. (i used nested subqueries in this context in nesting the calculation of the minimum population inside the major query
%%sql
select *
from facts
where population == (select max(population) from facts)
Now to our surprise, we are seeing the name of the country as world right?? yes, the only way to explain this is that the dataset had world as the row measuring the total population of the world which is about 7.2 billion people
%%sql
select avg(population), avg(area)
from facts
We can now see that the average population of the world is 62094928.32231405 which is about 62 million people, and the average area is 555093.546184739 which is about 555 thousand km^2 sq/area
%%sql
select *
from facts
where
population > (select avg(population) from facts) and
area < (select avg(area) from facts)
Good, seems little, it is a way of going through the basics of sql, which covers some syntax(not all) mostly used in its operation such as select, min, max, from, where etc. we are done with the basics.