This tutorial aims to teach the basics of python programming with a focus on its application in computational biochemistry. Assuming no previous knowledge of python, we start by explaining the basic building blocks (e.g. variables, conditionals, and lists) and build upon these to show examples of how python can be used to solve problems related to computational biochemistry (e.g. analysis of Molecular Dynamics simulations).
The tutorial does not aim to completely cover every aspect of python programming, this unfortunately would not be feasible in a half day session. Instead we encourage attendees to use this as a jumping point into learning more about python and its many applications.
We recommend the following useful resources for python:
The tutorial is thought across a series of jupyter notebooks:
These notebooks are contained within their own labelled/named directories. Within these directories you will also sometimes find datafiles
sub-directories that contain all the relevant data to run the notebook, in addition to copies of the notebooks with the exercise solutions (whose name ends in _solutions.ipynb
).
We recommend that each section be followed in order, however those with prior experience with python may wish to jump to the later sections.
Please see the setup.md
"Starting the tutorial" instructions.
Briefly, using a terminal (Anaconda prompt in Windows) do the following:
conda activate OxPython
jupyter notebook
.ipynb
notebook you wish to open.<notebook_name>_solutions.ipynb
, we recommend you only open these once you have finished going through a notebook.Jupyter notebooks are a useful way to demonstrate python code in a clear and organised manner. Each notebook is composed of a series of cells
which can either contain text (such as this one) or code (which will usually have the words In []:
next to them). You can type in code cells
by clicking on them and typing whatever changes you want to make.
Code cells can be used to run any valid python code. To do this, simply click on the cell and press the Run
button in the toolbar above, or type Shift+Enter
.
Important note: All code executed is retained in the memory
of the notebook. That means that if you import
a module or declare a variable, these will be seen and can interact with code executed in cells
further down the notebook. There are many places in this tutorial where modules are imported early on in the notebook and used in subsequent code cells
. We therefore recommend that tutorial users do not jump to later parts of the notebook without doing the earlier ones.
Notebooks can be cleared to their original state by navigating to Kernel > Restart & Clear Output
. A further prompt will ask you if you wish to clear all the outputs, pressing on this will return the notebook to a state where no code cells
have been run.
Let's start things off with executing our first python command. In the cell below we will use print
to write out "Hello world". Try it out by clicking on the cell and pressing Run
in the toolbar above.
print('Hello world')
The print
function takes an input value (some text, in quotation marks) and prints the text as output (without the quotation marks). One should note that Python is case-sensitive - ie. we can't use Print
or PRINT
.
Whilst print
is a very simple function, it is one of the most useful. It allows us to write out any information relevant to our python code. This can be anything from usage instruction to the output of some calculation.
Note: As of python 3, the inputs to the 'print' function must be placed within parantheses. If you use code developed for python 2.7 or lower, you may notice that this requirement is not followed and therefore will cause issues if run using python 3+. There are other key differences between python 3 and previous versions, some of which will be covered in this tutorial. For a more detailed look into them, the following resource may be of use: Porting Python 2 Code to Python 3. As python 2.7 will reach end of life as of January 2020, we strongly recommend that no new code be written in python 2.7.
In the cell below, use print to output the phrase "This is the second command".
# Exercise 0.1.1
print('This is the second command')
You will have noticed that the "Exercise 0.1.1" in the above box does not affect the notebook output. In python, in-line code comments are indicated using the # symbol. Any text that comes after this is considered to be a comment and is ignored by the python interpreter. It is considered good practice to comment your code in order to make it more readable for yourself and future users. Other means of commenting python code, such as docstrings, also exist and will be briefly covered in section 8 of this tutorial.
Whilst jupyter notebooks can be useful for demonstrating python code and prototyping new ideas, the main way in which to run python code is to execute a script. To do this, you need to write the python instructions within a file and execute it via the python interpretor.
For example if you were to write the following in a file (let's assume we call it hello.py):
print("Hello world")
And then type python hello.py
in a terminal, this should print out Hello world
.
In the remainder of this tutorial we will, for convenience, focus on executing code within jupyter notebooks. However, it is worth remembering that using written scripts/code is often a much more reproducible and manageable way to use python.
Python is an interpreted programming language. This means that Python code is read and executed line by line by the Python interpreter (the program that transforms Python code you wrote into something that your computer can understand and run). This is different from compiled languages (such as FORTRAN or C/C++), where all the code is first translated by the compiler to something the computer can understand and then run.
As as an interpreted language, Python is usually slower than compiled languages such as FORTRAN and C++. However, Python is extremely flexible, can be made reasonably fast under the hood, and it is relatively easy to run on different operating systems. This is the reason why it popularity exploded in recent years, especially in computational and data sciences: its clean syntax allows to write clear and concise programs, while sacrificing speed to a minimum.
In computational sciences there are other commonly used interpreted languages such as Matlab and R. Matlab's interpreter is not free and open source, so you won't be able to run your code unless you keep paying for a licence. R is a fantastic language for statistical computing, but it is very domain-specific. Python, in contrast, is free, open source and has a wide domain of applicability and it is therefore a very powerful language to know, for beginners and experts alike.
In this tutorial section we have:
print
.python
interpretor.