If you have completed M269 (a prerequisite course for TM351), you should already be familiar with the idea of SQL, the widely used query language for querying relational databases.
You will have an opportunity to learn about SQL in far more depth throughout TM351, but for now, let's quickly recap on the exercises from M269.
To run this notebooks, download and unzip the files contained in the gist repository psychemedia/14ac489ebb6bec3ee84f, then open the .ipynb
file within a Jupyter notebook server.
from m269_db import *
!unzip -o m269_database.db.zip
m269_database="./m269_database.db"
One of the tables in the M269 database was called student
.
#We can examine the contents of the student table
#by selecting all the data contained in that table in the database
q='''SELECT * FROM student;'''
show(q)
student_id | student_name | location | module ------------|--------------|------------|-------- X008 | Nils | London | T353 X025 | Olivia | York | T238 X048 | Mike | Bath | T238 X020 | Rory | Manchester | T304 X002 | Mike | Manchester | M218 X019 | Sara | Bristol | T304 X043 | Willow | London | T304 X023 | Petra | Bristol | T353 X042 | Yves | Manchester | M381 X018 | Ulla | London | M381
#We can use the column names to restrict which columns of the table we want to look at.
q='''
SELECT student_id, location
FROM student;
'''
show(q)
student_id | location ------------|------------ X008 | London X025 | York X048 | Bath X020 | Manchester X002 | Manchester X019 | Bristol X043 | London X023 | Bristol X042 | Manchester X018 | London
#Write an SQL query which will return the columns student_name and module from the student table.
q='''
'''
show(q)
#Choose particular rows of a table using conditions.
#The returned table should contain all the columns in the student table if we use SELECT *.
q='''
SELECT * FROM student WHERE location='Manchester';
'''
show(q)
student_id | student_name | location | module ------------|--------------|------------|-------- X020 | Rory | Manchester | T304 X002 | Mike | Manchester | M218 X042 | Yves | Manchester | M381
#Conditions in the WHERE clause can also be combined with the logical operators AND, OR and NOT.
#So if we wanted to list those rows which contained those students who live
# in Manchester or who are studying module T304, we could use the query:
q='''
SELECT *
FROM student
WHERE location='Manchester' OR module='T304';
'''
show(q)
student_id | student_name | location | module ------------|--------------|------------|-------- X020 | Rory | Manchester | T304 X002 | Mike | Manchester | M218 X019 | Sara | Bristol | T304 X043 | Willow | London | T304 X042 | Yves | Manchester | M381
#Restrict the returned columns by making the required columns explicit in the SELECT clause
q='''
SELECT student_id, student_name
FROM student
WHERE location='Manchester' OR module='T304';
'''
show(q)
student_id | student_name ------------|-------------- X020 | Rory X002 | Mike X019 | Sara X043 | Willow X042 | Yves
#What are the student identification codes and the names of those students who live in Manchester?
q='''
'''
show(q)
#What are the student identifiers of all the students named Mike?
q='''
'''
show(q)
#In which towns are there students studying the module T353 or module T238?
q='''
'''
show(q)
The M269 database, m269_db, contains two further tables - country
and language
- which contain much more information than the student table we have looked at so far.
#Examine the contents of the country table
q='''
'''
show(q)
sadas
#Examine the contents of the language table
q='''
'''
show(q)
When we define SQL queries, we typically do so in order to answer a particular question. In many cases, the question to be asked will initially be formulated in natural language terms before being refined and developed as an SQL query. We can also go the other way, expressing SQL queries in more naturalistic terms.
Give an equivalent English question that each of the following queries might have been used to answer:
SELECT * FROM country WHERE capital_city='Helsinki';
YOUR ANSWER HERE
SELECT country_iso FROM language WHERE language='Portuguese';
YOUR ANSWER HERE
SELECT country_name FROM country WHERE has_capital=0 AND continent='Asia';
YOUR ANSWER HERE
Now try answering a few more questions by making your own queries on to the database:
#What is the capital of Chad?
#List every official language used in the country with the ISO code BOL (Bolivia).
#List the name of every country which is in Europe or North America.
The m269_db database contains another dataset made up from the tables man
, woman
, married
and parent
.
#Explore each of the tables to familiarise yourself with the columns and typical row values
q='''
SELECT *
FROM person;
'''
show(q)
person_name ------------- Freddie Adam Claire Indira Gail Dmitri Brian Jenny Hassan Eve
q='''
SELECT *
FROM town;
'''
show(q)
town_name ------------ Bristol York Bath London Norwich Newcastle Manchester Sunderland
q='''
SELECT *
FROM residence;
'''
show(q)
person_name | lives_in -------------|------------ Gail | Manchester Freddie | Bath Indira | London Jenny | Sunderland Hassan | York Eve | London Claire | Bristol Brian | Manchester Adam | London Dmitri | Bristol
Samples of two of the tables have been gathered into separate tables:
q='''
SELECT *
FROM small_student;
'''
show(q)
student_id | student_name ------------|-------------- X008 | Mike X018 | Rachel X043 | Simon
q='''
SELECT *
FROM small_parent;
'''
show(q)
mother | father | child --------|--------|-------- Olivia | Mike | Victor Rachel | John | Willow
Do you remember how we can join tables?
q='''
SELECT *
FROM small_student CROSS JOIN small_parent;
'''
show(q)
student_id | student_name | mother | father | child ------------|--------------|--------|--------|-------- X008 | Mike | Olivia | Mike | Victor X008 | Mike | Rachel | John | Willow X018 | Rachel | Olivia | Mike | Victor X018 | Rachel | Rachel | John | Willow X043 | Simon | Olivia | Mike | Victor X043 | Simon | Rachel | John | Willow
#Which students are also fathers?
#We are only interested in those rows where the value in the student_name column is equal to
# the value in the father column:
q='''
SELECT *
FROM small_student CROSS JOIN small_parent
WHERE student_name=father;
'''
show(q)
student_id | student_name | mother | father | child ------------|--------------|--------|--------|-------- X008 | Mike | Olivia | Mike | Victor
#to answer the question that was actually asked, we might prefer to return only
# the student’s personal identifier and name
q='''
SELECT student_id, student_name
FROM small_student CROSS JOIN small_parent
WHERE student_name=father;
'''
show(q)
student_id | student_name ------------|-------------- X008 | Mike
#Examine the tables residence and administration
#Examine the effect of CROSS JOINing them under the condition that lives_in=town
#List the names of those people who live in the county of Somerset.
q='''
SELECT person_name
FROM residence CROSS JOIN administration
WHERE lives_in=town AND county='Somerset';
'''
show(q)
person_name ------------- Freddie Claire Dmitri
#What column or columns are defined over the same objects in the two tables country and language?
If you executed the query:
SELECT * FROM country CROSS JOIN language WHERE iso3166=country_iso;
what would the names of the columns in the returned table be?
YOUR ANSWER HERE
What would each row tell you?
YOUR ANSWER HERE
This completes our quick recap of the SQL commands you should recall from M269. SQL is actually a far richer language than we have seen and you will have an opportunity to learn more about it throught the course.
You will also see how the sorts of operation that can be applied using SQL can also be applied using other data manipulation languages. Developing a good understanding of how data is represented, and the operations that can then be applied to it, in abstract terms is essential if you are to be able to work with data effectively.