In this section, we will introduce some basic programming concepts in Python.
So far, we have learned a bit about variables, their values, and data types in this section. We will now continue with a new data type called a {term}list
. Using a list, we can store many related values together with a single variable. In Python, there are several different types of data that can be used to store values together in a {term}collection
, and a list is the simplest type.
To explore lists, we will be using data related to Finnish Meteorological Institute (FMI) observation stations [^FMI_stations]. For each station, a number of pieces of information are given, including the name of the station, an FMI station ID number (FMISID), its latitude, its longitude, and the station type.
Let’s first create a list
of some station names and print it out.
station_names = [
"Helsinki Harmaja",
"Helsinki Kaisaniemi",
"Helsinki Kaivopuisto",
"Helsinki Kumpula",
]
We can also check the type of the station_names
list using the type()
function.
Here we have a list of 4 station name values in a list called station_names
. As you can see, the type()
function recognizes this as a list
. Lists can be created using the square brackets [
and ]
, with commas separating the values in the list.
To access an individual value in a list we need to use an {term}index
value. An index value is a number that refers to a given position in the list. Let's check out the first value in our list
as an example by printing out station_names[1]
:
Wait, what? This is the second value in the list we’ve created, what is wrong? As it turns out, Python (and many other programming languages) start values stored in collections with the index value 0
. Thus, to get the value for the first item in a list, we must use index 0
. Let's print out the value at index 0
of station_names
.
OK, that makes sense, but it may take some getting used to...
As it turns out, index values are extremely useful, common in many programming languages, yet often a point of confusion for new programmers. Thus, we need to have a trick for remembering what an index value is and how they are used. For this, we need to be introduced to Bill (Figure 2.1).
Figure 2.1. Bill, the vending machine.
As you can see, Bill is a vending machine that contains 6 items. Like Python lists, the list of items available from Bill starts at 0 and increases in increments of 1.
The way Bill works is that you insert your money, then select the location of the item you wish to receive. In an analogy to Python, we could say Bill is simply a list of food items and the buttons you push to get them are the index values. For example, if you would like to buy a taco from Bill, you would push button 3
. If we had a Python list called Bill
, an equivalent operation would simply be
print(Bill[3])
Taco
We can find the length of a list using the len()
function.
Just as expected, there are 4 values in our list
and len(station_names)
returns a value of 4
.
If we know the length of a list, we can now use it to find the value of the last item in the list, right? What happens if you print the value from the station_names
list at index 4
, the value of the length of the list?
An IndexError
? That’s right, since our list starts with index 0
and has 4 values, the index of the last item in the list is len(station_names) - 1
. That isn’t ideal, but fortunately there’s a nice trick in Python to find the last item in a list. Let's first print the station_names
list to remind us of the values that are in it.
To find the value at the end of a list, we can print the value at index -1
. To go further up a list in reverse, we can simply use larger negative numbers, such as index -4
.
Yes, in Python you can go backwards through lists by using negative index values. Index -1
gives the last value in the list and index -len(station_names)
would give the first. Of course, you still need to keep the index values within their ranges. What happens if you check the value at index -5
?
Which animal is at index -2 in the Python list below?
cute_animals = ["bunny", "chick", "duckling", "kitten", "puppy"]
# Use this cell to enter your solution.
Another nice feature of lists is that they are {term}mutable
, meaning that the values in a list that has been defined are able to be modified. The {term}immutable
equivalent of a list in Python is called a {term}tuple
, which we will use later in this part of the book. Consider a list of the observation station types corresponding to the station names in the station_names
list.
station_types = [
"Weather stations",
"Weather stations",
"Weather stations",
"Weather stations",
]
station_types
Let's change the value for station_types[2]
to be 'Mareographs'
and print out the station_types
list again.
One of the benefits of a list is that they can be used to store more than one type of data. Let’s consider that instead of having a list of each station name, FMISID, latitude, etc. we would like to have a list of all of the values for a single station. In this case we will create a new list
for the Helsinki Kaivopuisto station.
station_name = "Helsinki Kaivopuisto"
station_id = 132310
station_lat = 60.15
station_lon = 24.96
station_type = "Mareographs"
Now that we have defined five variables related to the Helsinki Kaivopuisto station, we can combine them in a list similar to how we have done previously.
station_hel_kaivo = [station_name, station_id, station_lat, station_lon, station_type]
station_hel_kaivo
Here we have one list with 3 different types of data in it. We can confirm this using the type()
function. Let's check the type of station_hel_kaivo
and the types of the values at indices 0-2
.
Note that although it is possible to have different types of data in a Python list, you are generally encouraged to create lists containing the same data types. Data science workflows are often built around handling collections of data of the same type and having multiple data types in a list may cause problems for software you are trying to use.
Finally, we can add and remove values from lists to change their lengths. Let’s consider that we no longer want to include the first value in the station_names
list. Since we have not seen that list
in a bit, let's first print it out.
The del
statement allows values in lists to be removed. It can also be used to delete values from memory in Python. To remove the first value from the station_names
list, we can simply type del station_names[0]
. If you then print out the station_names
list, you should see the first value has been removed.
In addition to the del
statement, there are two other common approaches for removing items from lists in Python. Let's consider both with an example list called demo_list
.
demo_list.remove(value)
: Will iterate over the list demo_list
and remove the first item with a value equal to value
demo_list.pop(index)
: Will remove the item at index index
from the list demo_list
If we would instead like to add a few more stations to the station_names
list, we can type station_names.append('List item to add')
, where 'List item to add'
would be the text that would be added as a new item in the list in this example. Let's add two values to our list
: 'Helsinki lighthouse'
and 'Helsinki Malmi airfield'
and check the list contents after this.
As you can see, we add values one at a time using station_names.append()
. list.append()
is called a {term}method
in Python, which is a function that works for a given data type (a list in this case).
Let’s consider our station_names
list. As we know, we already have data in the list station_names
and we can modify that data using built-in methods such as station_names.append()
. In this case, the method .append()
is something that exists for the list
data type, but not for other data types. It is intuitive that you might like to add (or append) things to a list, but perhaps it does not make sense to append to other data types. Let's create a variable station_name_length
that we can use to store the length of the list station_names
. We can then print the value of station_name_length
to confirm the length is correct.
If we check the data type of station_name_length
, we can see it is an integer value as expected.
Let's see what happens if we try to append the value 1
to station_name_length
.
Here we get an AttributeError
because there is no method built in to the int
data type to append to int
data. While .append()
makes sense for list
data, it is not sensible for int
data, which is the reason no such method exists for int
data.
With lists we can do a number of useful things, such as count the number of times a value occurs in a list or where it occurs. The .count()
method can be used to find the number of instances of an item in a list. For instance, we can check to see how many times 'Helsinki Kumpula'
occurs in our list station_names
by typing station_names.count('Helsinki Kumpula')
.
Similarly, we can use the .index()
method to find the index value of a given item in a list. Let's find the index of 'Helsinki Kumpula'
in the station_names
list.
The good news here is that our selected station name is only in the list once. Should we need to modify it for some reason, we also now know where it is in the list (index 2
).
There are two other common methods for lists that are quite useful.
The .reverse()
method can be used to reverse the order of items in a list. Let's reverse our station_names
list and then print the results.
Yay, it works! A common mistake when reversing lists is to do something like station_names = station_names.reverse()
. Do not do this! When reversing lists with .reverse()
the None
value is returned (this is why there is no screen ouput when running station_names.reverse()
). If you then assign the output of station_names.reverse()
to station_names
you will reverse the list but then overwrite its contents with the returned value None
. This means you’ve deleted the contents of your list!
The .sort()
method works the same way as reversing a list. Let's sort our station_names
list and print its contents.
As you can see, the list has been sorted alphabetically using the .sort()
method, but there is no screen output when this occurs. Again, if you were to assign that output to station_names
the list would get sorted but the contents would then be assigned None
. And as you may have noticed, Helsinki Malmi airfield
comes before Helsinki lighthouse
in the sorted list. This is because alphabetical sorting in Python places capital letters before lowercase letters.
Earlier in this section we defined five variables related to the Helsinki Kaviopuisto observation station:
station_name
station_id
station_lat
station_lon
station_type
which refer to the the name of the station, an FMI station ID number (FMISID), its latitude, its longitude, and the station type. Each variable has a unique name and they store different types of data.
As you likely recall, we can explore the different types of data stored in variables using the type()
function.
Let's now check the data types of the variables station_name
, station_id
, and station_lat
.
As expected, we see that the station_name
is a character string (type str
), the station_id
is an integer (type int
), and the station_lat
is a floating point number (type float
). Being aware of the data type of variables is important because some are not compatible with one another. Let's see what happens if we try to sum the variables station_name
and station_id
.
Here we get a TypeError
because Python does not know to sum a string of characters (station_name
) and an integer value (station_id
).
It is not the case that things like the station_name
and station_id
cannot be combined at all, but in order to combine a character string with a number we need to perform a {term}type conversion
to make them compatible. Let's convert station_id
to a character string using the str()
function. We can store the converted variable as station_id_str
.
We can confirm the type has changed by checking the type of station_id_str
or by checking the output of a code cell with the variable.
As you can see, str()
converts a numerical value into a character string with the same numbers as before. Similar to using str()
to convert numbers to character strings, int()
can be used to convert strings or floating point numbers to integers, and float()
can be used to convert strings or integers to floating point numbers.
What output would you expect to see when you execute print(station_id + station_lon)
?
# Use this cell to enter your solution.
What output would you expect to see when you execute print(station_name + station_id_str)
?
# Use this cell to enter your solution.
Although most mathematical operations are applied to numerical values, a common way to combine character strings is using the addition operator +
. Let's create a text string in the variable station_name_and_id
that is the combination of the station_name
and station_id
variables. Once we define station_name_and_id
, we can print it to the screen to see the result.
Note that here we are converting station_id
to a character string using the str()
function within the assignment to the variable station_name_and_id
. Alternatively, we could have simply added station_name
and station_id_str
.
The previous case showed a simple example of how it is possible to combine character strings and numbers together using the +
operator between the different text components. Although this approach works, it can become quite laborious and error prone when you have a more complicated set of textual and/or numerical components that you work with. Hence, we next show a few useful techniques that make manipulating strings easier and more efficient.
There are three approaches that can be used to manipulate strings in Python: (1) f-strings, (2) using the.format()
method, and (3) using the %
operator. We recommend using the f-string approach, but we also provide examples of the two other approaches because there are plenty of examples and code snippets on the web where these string formatting approaches are still used. Hence, it is good to be aware of them all. In addition, we show a few useful methods that make working with text in different ways possible.
In the following, we show how we can combine the station_name
text, the station_id
integer number, and the temp
floating point number together using Python's f-string formatting approach. In addition, we will round a decimal number (temp
) to two decimal points on the fly.
# An example temperature with many decimals
temp = 18.56789876
# 1. The f-string approach (recommended)
Figure 2.2. F-string formatting explained.
As you can see, using string formatting it is possible to easily modify a body of text "interactively" based on values stored in given variables. Figure 2.2 breaks down the different parts of the string. The text that you want to create and/or modify is enclosed within the quotes preceded with letter f
. You can pass any existing variable inside the text template by placing the name of the variable within the curly braces {}
. Using string formatting, it is also possible to insert numbers (such as station_id
and temp
) within the body of text without needing first to convert the data types to strings. This is because the f-string functionality kindly does the data type conversion for us in the background without us needing to worry about it (handy!).
It is also possible to round numbers on the fly to specific precision, such as the two decimal points in our example, by adding the format specifier (:.2f
) after the variable that we want to round. The format specifier works by first adding a colon (:
) after the variable name, and then specifying with dot (.
) that we want to round our value to 2 decimal places (can be any number of digits). The final character f
in the format specifier defines the type of the formatting that will be done: the character f
will display the value as a decimal number, the character e
would make the number appear in scientific notation, while the character %
would convert the value to percentage representation.
As we have hopefully demonstrated, f-string formatting is easy to use, yet powerful with its capability to do data conversions on the fly, for example. Hence, it is the recommended approach for doing string manipulation presently in Python. Just remember to add the letter f
before your string template!
As mentioned previously, there are also a couple of other approaches that can be used to achieve the same result as above. These older approaches preceded the f-string, which was introduced in Python version 3.6. The first one is the .format()
method, which is placed after the string in quotes, like this:
# 2. .format() approach (no longer recommended)
text2 = (
"The temperature at {my_text_variable} (ID: {station_id}) is {temp:.2f}.".format(
my_text_variable=station_name, station_id=station_id, temp=temp
)
)
print(text2)
As you can see, here we get the same result as we did with an f-string, but we used the .format()
placed after the quotes. The variables were inserted within the text template using curly braces and giving them a name (placeholder) which is expected to have a matching counterpart within the .format()
parentheses that link to the variable value that will be inserted in the body of text. As you see, the placeholder does not necessarily need to have the same name as the actual variable that contains the inserted value, but it can be anything, like the name my_text_variable
as in the example above.
The last (historical) string formatting approach is to use the %s
operator. In this approach, the placeholder %s
is added within the quotes, and the variables that are inserted into the body of text are placed inside parentheses after the %
operator, like this:
# 3. The % operator approach (no longer recommended)
text3 = "The temperature at %s (ID: %s) is %.2f" % (station_name, station_id, temp)
print(text3)
The order of the variables within the parentheses specify which %s
placeholder will receive what information. The order of the variables inside parentheses needs to be corrected always if making changes to the placing of the placeholders, and there should be exactly the same number of variables within the parentheses as there are %s
placeholders within the text template. Hence, this approach is prone to errors and causing confusion, which is why we do not recommend using it.
To conclude, using the f-string approach is the easiest and most intuitive way to construct and format text. Hence, we highly recommend learning that approach and sticking with it.
Here we demonstrate some of the most useful string manipulation techniques, such as splitting strings based on a given character, replacing characters with new ones, slicing strings, etc.
The aim is to produce a list of weather station locations in Helsinki that are represented in uppercase text (i.e., KUMPULA, KAISANIEMI, HARMAJA
). The text that we will begin working with is below:
text = "Stations: Helsinki Kumpula, Helsinki Kaisaniemi, Helsinki Harmaja"
Let's start by demonstrating how we can split a string into different parts based on specific character(s). We can split the given text using the colon character (:
) by passing the character into a method called .split()
.
As a result, the body of text was split into two parts in a list, where the first item (at index 0) now has the text Stations
(i.e., the text preceeding the colon) and the second item (at index 1) contains the body of text listing the stations that are separated by commas.
Now we can continue working towards our goal by selecting the stations text from the splitted
list at index 1.
As can be seen, the first character in our string is actually an empty space (' ') before the word Helsinki. We can remove that character easily by slicing the text. Each character in a character string can be accessed based on its position (index) in the same way as with the Python lists that were introduced earlier in this chapter. We can slice our word by specifying that we want to keep all characters after the first position (i.e., removing the empty space). We can do this by adding the position inside square brackets ([]
) where we want to start accessing the text, and by adding a colon (:
) after this number, we can specify that we want to keep all the rest of the characters in our text (i.e., we take a slice of it).
Now we have accessed and stored all the characters starting from position index 1, and hence dropped the first empty space. An alternative approach for achieving this would be to use a method called .strip()
. You could also specify a specific range of characters that you want to slice from the word by adding the index position after the colon (e.g. [1:9]
would have separated the word Helsinki
from the text).
Currently in the processed text, the word Helsinki
is repeated multiple times before the station names. We can easily remove this word by replacing the word Helsinki
with an empty string (""
), which will basically delete this word from the text. We can accomplish this by using a method called .replace()
which takes an original word as the first argument and a replacement word (or character(s)) as the second argument.
Now we have replaced the word "Helsinki "
with nothing (an empty string), and as a result we have text where only the station names are listed.
Finally, we can easily change the text to uppercase using a method called .upper()
. Similarly, we could make the text all lowercase or capitalize only the first character using .lower()
or .capitalize()
, respectively.