At the end of this section, you will be able to:
Lists resemble strings: both are a sequence of values. But whereas a string was a sequence of characters, a list can contain values of any type. These values we call elements or items.
this_is_a_string = 'Hello Newman'
this_is_a_list = ['Hello','Jerry',42,3.1415]
Consider the first sentence (represented as a string) from Franz Kafka's book 'The Trial'. Image for a moment we would have assigned the whole book to the trial
variable.
trial = "Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. "
A string is a sequence of characters.
How can we select specific words from this book? For the sentence above, it might seem more natural for humans to describe it as a series of words, rather than as a series of characters. Say, we want to access the first word in our sentence. If we enter:
first_word = trial[0]
print(first_word)
split()
converts this string to a list of words.
Python only prints the first character of our sentence. (Think about this if you do not understand why.) We can, however, transform our sentence into a list of words (represented by strings) using the split() function as follows:
words = trial.split()
print(words)
The variable trial
now holds the first line of Kafka's Trial as a list. Each element in this list is now (approximately) a word. Run the code below to see the difference.
first_word = words[0]
print(first_word)
We apply the split()
function to the variable trial
and we assign the result of the function (we call this the 'return value' of the function) to the new variable words
.
# Exercise: split the first line of Genesis and assign the list to the variable `bible`
# In the beginning God created the heaven and the earth.
split()
takes other delimiters as arguments.
By default, the split() function in Python will split strings on the spaces between consecutive words and it will return a list of words. However, we can pass an argument to split()
that specifies explicitly the string we would like to split on.
This is often useful for parsing information from a CSV file (or other structured data). For example, the line below has the structure of the Google Ngram. The Ngram Viewer allows the researcher to explore long-term cultural trends. The source data of this corpus comprises the yearly word and document frequencies for ~ 5 million books printed between 1500 and 2008. The lines are separated by hard returns ("\n"). Each line holds four elements: word, year, word frequency, document frequency.
Using the split function, we can easily parse this file, i.e. recognize and read its content. First, we split the string by their "\n" and then each line by their "\t" which is stands for a "tab".
In the code block below, we will split a string on tabs, instead of spaces. Do you get the syntax?
# note the `\n` symbol
google_ngram = "queen\t1900\t20394\t3435\nqueen\t1901\t23340\t2935\nqueen\t1902\t23120\t3035"
print(google_ngram)
google_ngram = google_ngram.split("\n")
print(google_ngram)
first_line = google_ngram[0]
print(first_line.split('\t'))
queen 1900 20394 3435 queen 1901 23340 2935 queen 1902 23120 3035 ['queen\t1900\t20394\t3435', 'queen\t1901\t23340\t2935', 'queen\t1902\t23120\t3035'] ['queen', '1900', '20394', '3435']
join()
is the reverse of split()
The reverse of the split()
function can be accomplished with join()
, it turns a list into a string, with a specific 'delimiter' or the string you want to use to join the items.
observation = ['queen', '1900', '20394', '3435']
delimiter = ', '
csv_string = delimiter.join(observation)
print(csv_string)
In the previous chapter, we argued that variables operate as "boxes"--you put a value in there, to save it for later. Until now the box could only contain one item, a string or a number. Lists expand the possibilities, they serve as "containers". With lists, you can stuff your box with as many different elements as you'd like. Let's have a look at how this works.
To store an empty list in variable x
, simply assign x
to []
(square brackets).
# create an empty list
x = []
We can also create lists with some content: enclose the individual items within square brackets, separated by a comma.
my_grades = [8,9,6,7]
print(my_grades)
my_garbage = ['Potatoe',[1,2,3],9.03434,'frogs']
print(my_garbage)
Python allows at least the following very useful list operations:
Arithmetic operators:
but also includes comparison and membership operators (more about this in the next Notebook)!
Similar to strings, Python comes with specific operations (*
and +
) that you can apply to a list.
The +
operator concatenates lists:
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
print(c)
Similarly, the *
operator repeats a list for a given number of times:
# First example of the * operator
print([0]*4)
# First example of the * operator
a = ['spam','Spam','SPAMMM']
b = a * 5
print(b)
The first example multiplies the single-item-list four times. The second repeats the list with typographic variations on the word 'spam' five times.
You can use lists in membership boolean expressions (See Notebook 2.2). The in
operator checks whether the items 'God' appears in the variable bible
.
bible = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth.']
print('God' in bible)
True
This is often useful for checking if a text contains a specific word. For sure, the line below returns the same result.
'God' in 'In the beginning God created the heaven and the earth.'
But it matches the string not the "word"!
print('God' in 'My Godess created the heaven and the earth.')
print('God' in ['My', 'Godess', 'created', 'the', 'heaven', 'and', 'the', 'earth.'])
Indexing and slicing works the same way as with strings. Every item in the list has hence its own index number. We start counting at 0! The indices for our list ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn'] are as follows:
J.S. Bach | W.A. Mozart | |
---|---|---|
0 | 1 | 2 |
-3 | -2 | -1 |
[VU] We can hence use this index number to extract items from a list (just as with strings)
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list[0])
print(composer_list[1])
print(composer_list[2])
Obviously, we can also use negative indices:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list[-1])
print(composer_list[-2])
print(composer_list[-3])
And we can extract one part of a list using slicing:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
list_with_less_composers = composer_list[:2]
print(list_with_less_composers)
A common error is to retrieve elements by indices greater than the length of the list (minus -1).
print(composer_list[5])
The IndexError
tells you that it could not find an items at position five, as the range of the positions only goes from 0 till 2.
Index notation can be used to replace elements in the list. Let's say, we want to get rid of Mozart (at position 1) en replace him with Elvis. As lists are mutable you replace the items.
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
composer_list[1] = 'Elvis P.'
print(composer_list)
Similarly, a slice operator on the left side of an assignment can update multiple elements:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
composer_list[1:] = ['L. van Beethoven','A. Webern']
print(composer_list)
The fact that you were able to replace an element by index (as in the above cell) relates to the mutability of lists. For example, performing a similar manipulation on a string object, will cause a TypeError
.
misspelled = 'Pythvn'
misspelled[4] = 'o'
If we convert the string to a list we can get rid of this naughty typo.
misspelled = 'Pythvn'
print(misspelled)
misspelled = list(misspelled)
print(misspelled)
misspelled[4] = 'o'
print(misspelled)
misspelled = ''.join(misspelled)
print(misspelled)
In short: lists are mutable--you can manipulate the content of list variables--whereas strings are not. Question: can you predict whether the following code raises an error?
The more general slicing notation have the form list[start:stop:step]
till_twenty = list(range(0,21))
print(till_twenty)
# Exercise: how can we print all even numbers?
evens = till_twenty[]
print(evens)
# Exercise: how can we print all odd numbers?
As lists are mutable, they provide a much more flexible data type. Lists come with specific methods, a set of powerful tools that Python already pre-cooked for you. These tools help you with building and manipulating lists.
Most of the crucial list functionalities are provided by the inbuilt list methods: functions attached to the list object. For an overview of the available methods run the code below (scroll down, for this course you can ignore the methods starting and ending with double underscores.)
writers_list = []
print(type(a_list))
We learn, unsurprisingle to that the variable a_list is of type list
. Let's inspect the functionalities Python provides for working with lists.
help(list)
append()
and extend()
extend the list with other values
The first method we encounter is append
. To see what this method does use the same help
function as before
help(list.append)
append
is a method that adds new items to the end of a list. It has one positional argument and returns None
(we come back to this a few blocks below).
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list)
composer_list.append('L. van Beethoven')
print(composer_list)
# Exercise: add some other composers to the composer_list here
The Python help functionalities helps you exploring the methods attached to an object.
Exercise: find out what the method extend
does, and how to apply it to the writers_list.
help(list.extend)
Help on method_descriptor: extend(...) L.extend(iterable) -> None -- extend list by appending elements from the iterable
An example of extend()
is given below:
composer_list = ['J.S. Bach', 'W.A. Mozart','B Bartok']
print(composer_list)
composer_list.extend(['L. van Beethoven','F. Mendelssohn'])
print(composer_list)
['J.S. Bach', 'W.A. Mozart', 'B Bartok'] ['J.S. Bach', 'W.A. Mozart', 'B Bartok', 'L. van Beethoven', 'F. Mendelssohn']
To understand the difference between extend()
and append()
, compare the output above with the result of the print of this cell:
composer_list = ['J.S. Bach', 'W.A. Mozart','B Bartok']
print(composer_list)
composer_list.append(['L. van Beethoven','F. Mendelssohn'])
print(composer_list)
['J.S. Bach', 'W.A. Mozart', 'B Bartok'] ['J.S. Bach', 'W.A. Mozart', 'B Bartok', ['L. van Beethoven', 'F. Mendelssohn']]
dot notation in Python
Do you get the syntax that goes with the append()
function? The list we wish to append the item to goes first and we join the append()
function to this list using a dot (.
). In between the round brackets that go with the function name, we place the actual string that we wish to add to the list.
We call such a input value an 'argument' that we 'pass' to the append()
function.
Please reread the previous sentence, to get used to the terminology.
Make sure that you are familiar with this terminology because you will often come across such terms when you look for help online!
fruitful and void functions
Functions in Python are generally divided into fruitful and void functions? append
is a void function: similar to print
, it performs an operation (adds one element to the list) but returns nothing. Understanding this distinction may help you tracing bugs in future code.
a = composer_list.append('J. des Prez')
print(composer_list)
print(a)
It might be a bit confusing at first that a list method returns None. Please carefully look at the difference between the two following examples. To repeat: Please predict what will be printed in each code snippet below:
a_list = [1, 3, 4]
a_list.append(5)
print(a_list)
a_list = [1, 3, 4]
a_list = a_list.append(5)
print(a_list)
It is important to distinguish between operations that modify lists and operations that create
new lists. For example, the append()
method modifies a list, but the +
operator creates a
new list!
The append()
method is especially powerful in a for
loop.
We have a closer look at loops later, but the code below shows a context in which the append()
method is often applied. For example, we have a .tsv table which lists composers by their country of origin. Imagine, we want study composers by nationality. The code below demonstrates how to extract the relevant information from this table.
data = 'Justus Johann Friedrich Dotzauer\tGermany\nSaid Rustamov\tAzerbaijan\nFlor Alpaerts\tBelgium\nPetko Staynov\tBulgaria\nTheodor Ludwig Wiesengrund Adorno\tGermany\nAnna Amalia, Duchess of Brunswick-Wolfenbüttel\tGermany'
print(data)
As previously shown, we parse this table with the split()
function. First we the identify the rows (separated by hard returns or "\n") and later the cells within each row (separated by tabs or "\t")
rows = data.split('\n')
print(rows)
rows = data.split('\n')
# The rows are of type list
print(type(rows))
# But the first element in this list is still a string
print(type(rows[0]))
To process the separate cells, we first create an empty list called table
. Then we iterate over each row created by split('\n')
and split each row by their tab-symbol. The last step converts each row (which is still a string) to a list.
The code below makes clear what happens at every iteration in the for
loop:
Once we collected the information, we can start counting: how many of these composers come from Germany?
help(list.count)
The count() method has one positional argument value and returns an integer. As the name suggests, the method returns an integer that represents how often the value occurs in the list.
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
print(countries.count('Germany'))
print(countries.count('Belgium'))
Now, try to print the songs which mention a search term twice or more:
sort()
and sorted()
¶The sort()
function is a void function that sorts strings in alphabetical and numbers ascending order.
help(list.sort)
countries.sort()
print(countries)
my_grades = [9,8,10,7,9,9]
my_grades.sort()
print(my_grades)
The reverse
argument allows you to sort in ascending (reverse=False) or descending (reverse=True) order.
my_grades.sort(reverse=True)
print(my_grades)
Before evaluating the cell block below, can you guess what resulting order will look like?
my_grades = [9,8,10,7,9,9]
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
grades_and_countries = my_grades + countries
print(grades_and_countries)
grades_and_countries.sort(reverse=False)
print(grades_and_countries)
Unfortunately the standard Python order()
method is not smart enough to deal with a mixed list (type-wise). As an aside: a convenient solution would be to cast the integers as strings.
my_grades = ['9','8','10','7','9','9']
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
grades_and_countries = my_grades + countries
grades_and_countries.sort(reverse=False)
print(grades_and_countries)
The key
argument allows you to further refine your sorting. As we have not covered yet enough Python concepts for you to properly understand how this works, we leave it for the moment at two examples.
Basically, you pass a function for the argument key
. For example len
which counts how many items a value contains (the length of a value). If you pass the function len
as an argument, this will count how many characters each string contains, and order the list by the length of each item.
# Sorting by the length of string
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
countries.sort(key=len,reverse=True)
print(countries)
# get the longest song title about love
Extra: Or sort the items by their frequency of occurrence.
# Sorting by frequency of occurence
countries_ref = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
# Actually this is more elegant:
# from collections import Counter
# countries.sort(key=Counter(countries).get,reverse=True)
countries.sort(key=countries_ref.count,reverse=True)
print(countries)
sorted()
¶sorted()
is a fruitful function that returns a sorted list:
countries_ref = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
countries_sorted = sorted(countries_ref)
print(countries_sorted)
['Azerbaijan', 'Belgium', 'Bulgaria', 'Germany', 'Germany', 'Germany']
Let's assume a collection has grown a lot and we would like to remove some of the items from the list. Python provides the function remove()
that you can call on a list and which takes as argument the item we would like to remove.
good_reads = ["The Hunger games", "A Clockwork Orange",
"Pride and Prejudice", "Water for Elephants", "Illias"]
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)
If we try to remove a book that is not in our collection, Python raises an error to signal that something is wrong.
good_reads.remove("White Oleander")
Note, however, that remove()
will only delete the first item in the list that is identical to the argument which you passed to the function. Execute the code in the block below and you will see that only the first instance of "Pride and Prejudice" gets deleted.
good_reads = ["The Hunger games", "A Clockwork Orange",
"Pride and Prejudice", "Water for Elephants", "Pride and Prejudice"]
good_reads.remove("Pride and Prejudice")
print(good_reads)
A nested list can also represent a matrix, a notion we will often encounter further in this course.
[FROM Wikipedia] In mathematics, a matrix (plural: matrices) is a rectangular array[1] of numbers, symbols, or expressions, arranged in rows and columns. For example, the dimensions of the matrix below are 2 × 3 (read "two by three"), because there are two rows and three columns:
We can represent this matrix as a nested list an assign it to the variable nested_list
.
nested_list = [[1,9,-13],[20,5,-6]]
print(nested_list)
[[1, 9, -13], [20, 5, -6]]
To retrieve elements of the matrix, we use indexing and slicing techniques from the previous course.
print(nested_list[0])
print(nested_list[0][0])
print(nested_list[1][2])
print(nested_list[0][:-2])
To finish this section, here is an overview of the new concepts and functions you have learnt. Go through them and make sure you understand them all.
.split()
.append()
.count()
.remove()
.sort()
sorted()