This tutorial contains all the necessary steps you must take in order to start this course. Keep in mind that starting to program is similar to learning a new language. There will be times of frustration and when you don't understand what you're saying/doing wrong, but perserverence is key to success when learning how to program. Another piece of advice for your programming journey is to focus on the structure, logic and concepts of programming.
This tutorial is structured in two parts:
If you are new to programming, we would suggest you keep reading on from here (the simple version). If you already have the basic programming set up jump to the second part named extended version. We begin by introducing the elements you need and then start with getting all set up accompanied with an explanation why you need it. Python, Jupyter Notebook and Github are names that will be familier to you after this tutorial.
The command line serves as an essential user interface which is navigated by text commands (prompts) rather than a mouse. It is possible to carry out all operations which often cannot be acessed through your normal GUI (Graphical User Interface). On Windows machines it is known as the Command Prompt and on Mac OsX & Linux operating systems it is known as Terminal. When begining to code it is essential to understand some basic commands in order to utilize the command line. Bellow is a table of essential commands with a brief explanation of how to use them.
|Find Current Location||pwd||chdir||pwd|
mvcommand is entered, the command line will enumerate a list of all the items listed within the directory. When you first open the command line, this will be all the main components of your computer as seen in the figure above.
/User/Jordan/Desktop/Papers/PQAthis means you are currently in the folder labeled "PQA", located in the folder "Papers", which is on the desktop of the computer's user "Jordan."
cdalone is entered, the command line will return you to the original directory. It can also be used in conjunction with another directory to the specified direcotry. E.g.
cd Desktopwill take you to the desktop, and
cd Desktop/Paperswill take you to through the desktop to the folder "Papers."
Python is a programming language and was designed to be readable and easily understood. To better understand what a programming language is, imagine people as computers, in order to get people to do something you have to communicate with them. Programming languages are generally much simpler than normal languages, but different, which is why many people struggle with them, just imagine how you would feel if you had to live in a village in India where they only spoke Hindi, that would be much more difficult. Many beginners find coding with Python highly satisfactory, as they are able to construct prototypes and tools quickly and with ease. But the benefits don't stop there, python is also free and has a large community to help you if you get stuck. It is arguably the most beginner-friendly language, which is why we recommend it in this tutorial to get you started with coding. So lets get started!
Open your browser and follow this link to the anaconda download page, and download the version recommended for your system (Linux, Osx, Windows). Once installed Anaconda is ready to use. NB: Always choose the latest version, which is the one with the highest number.
After installing Python, you will need to use an Integrated Development Environment (IDE) to begin coding. Try to imagine an IDE as Word or Pages, it helps you to build beautiful documents by writing text in your language, insert illustrations and format it while turning it into something you can use. If you didn't have Word or Pages how would you make your CV, assignment, or application? An IDE lets you write in a programming language and turns into beatiful applications that you can use. Similar to Word and Pages, IDEs help you with supporting functions in this process, like the debugging tool which helps you correct errors just like word helps you with spelling and sentence structure errors. Or by highlighting elements of code (e.g. variables, strings, numbers) for better understanding as well as automatic formatting, which makes sure parenthesis are closed or that lines are indented. Now that you understand the meaning and importance behind the abstract IDE acronym lets continue our progress towards getting fully set up for your coding journey.
Once you have Python and an IDE installed (Jupyter Notebook), you are ready to begin coding! To do so, you would need to launch Python and the IDE first. Lucky you are using Jupyter since it's capable of running the code within the IDE itself. If your IDE would not be capable of this, you would need to run your code on the command line or terminal. To launch Jupyter do the following.
jupyter notebookand Jupyter Notebooks UI will then open in your browser
Once you write your first lines of code in the IDE, and feel ready to try out your program, you can run it in your terminal if you are using an IDE other than Jupyter Notebook, or run it in Jupyter Notebook itself as follows.
shift + enter. You should find that the latest version of Python you have installed has been started.
Now that you have everything set up, you are ready to start experimenting and building stuff. Python will give you the tool necessary to build various applications, but Markdown will help you edit text that you want to show in your application.
Markdown serves as the main text formating language for Jupyter Notebooks. Markdown is spacing and case sensitive. For instance, this means that when a user types
myCode, the program recognises them as two different variables. The same goes for spacing -
my code would be registered as two sepaprate variables. Markdown is very similar to HTML, as it is designed to be easily converted to HTML. The following is a list of 15 useful Markdown commands and how they appear in when the Markdown code is run.
It is an online distributed version control system which tracks changes made to a project file. In laymans's terms, it is a system which allows you to track all the changes made to a project. This is useful because it makes it easier to collaborate on projects since the system tracks the changes made by you and others. The online platform we will use to access this system is called Github.
To use Git, users would have to "clone" a copy of the online repository from Github onto their own hard drive and work on the file independently. After finalising the changes to the code, they will then upload their edited version back online. Git is primarily used for source-code management in software development, but it can be used to keep track of changes in any set of files.
|Version Control System||A system that track changes in files over time and maintains a library of all past versions of those files. These previous versions may be recalled at a later time. A more detailed explanation is provided in Chapter 4.2.|
|Repository||A folder containing all tracked files as well as the version control history. It can be saved onto a local folder on your computer or it can be stored on an online platform (i.e. remote repository). Github is an example of remote repository.|
|Snapshot||Changes mades while developing a program which may later be committed.|
|Commit||A snapshot of changes made to the staged files.|
|Stage||The staging area holds the files to be included in the next commit.|
|Track||A tracked file is one that is recognized by the Git repository from previous snapshots.|
|Branching||Having multiple versions of the code simultaneously in a repository, where each branch has its own commit history and current version.|
|Local||The version of a repository that is stored on your personal computer.|
|Remote||The version of a repository that is stored on a remote (i.e. online) server.|
|Clone||Create a local copy of a remote repository on your personal computer.|
|Fork||Make a copy of another user’s repository on GitHub to your own account.|
|Merge||To update files by incorporating the changes introduced in new commits.|
|Pull||To retrieve commits from a remote repository and merge them into a local repository.|
|Push||To send commits from a local repository to a remote repository.|
|Pull request||A message sent by one GitHub user to merge the commits in their remote repository into another user’s remote repository.|
Do you have a Github account?
Now that you have downloaded the desktop version of Github you have choice between using the desktop interface to work collaboratively on projects or you may use the terminal directly. Below you will find an introduction how to use either of these two options.
|git init||Initializes a new Git repository and begins tracking an existing directory. It adds a hidden subfolder within the existing directory that houses the internal data structure required for version control.|
|git clone||Creates a local copy of a project that already exists remotely. The clone includes all the project’s files, history, and branches.|
|git add||Stages a change. Git tracks changes to a developer’s codebase, but it’s necessary to stage and take a snapshot of the changes to include them in the project’s history. This command performs staging, the first part of that two-step process. Any changes that are staged will become a part of the next snapshot and a part of the project’s history. Staging and committing separately gives developers complete control over the history of their project without changing how they code and work.|
|git commit||Saves the snapshot to the project history and completes the change-tracking process. Anything that’s been staged with git add will become a part of the snapshot with git commit.|
|git status||Shows the status of changes as untracked, modified, or staged.|
|git branch||Shows the branches being worked on locally.|
|git checkout||Git checkout followed by the name of the branche conducts you to the branch.|
|git merge||Merges lines of development together. This command is typically used to combine changes made on two distinct branches.|
|git pull||Updates the local line of development with updates from its remote counterpart. Developers use this command if a teammate has made commits to a branch on a remote, and they would like to reflect those changes in their local environment.|
|git push||Udates the remote repository with any commits made locally to a branch.|
|git log||Viewing the Commit History.|
|git help||Getting help.|
And much more.
When you begin coding you will inevitably run into challenges. Thankfully there are many online communities and platforms where coders come together to help eachother. Some of these platforms include Stackoverflow or subreddits like r/programming and r/pyhton on Reddit. Additionally, there are several free online recources to help you to enhance your coding skills, such as MIT OpenCourseware, SoloLearn, and Codecademy.
Python is a general-purpose programming language, which means that it can be used for nearly everything. Unlike most programming languages, Python is an interpreted language, which means that the written code is not actually translated to a computer-readable format at runtime. This type of language is also referred to as a "scripting language" because it was initially meant for developing simple projects.
Python is also an object-oriented, high-level programming language with dynamic semantics, which makes it highly attractive for Rapid Application Development , as well as a tool to connect existing components together. Python can also be used to process text, display numbers or images, solve scientific equations, and save data. In essence, it is used behind the scenes to process many elements.
Python was designed for its users to learn syntax easily, hence its emphasis on readability. This reduces the cost of program maintenance as it enables teams to collaborate effectively without significant language and experience barriers. Furthermore, Python supports the use of modules and packages, encouraging program modularity, and code reuse across a diversity of projects. Once a module or package has been developed, it may be scaled for use in other projects. The Python interpreter and the extensive standard library are available in source or binary form, free of charge for all major platforms and may be distributed with ease.
Since its inception, the concept of Python being a "scripting language" has changed considerably. Python is now used to write large, commercial style applications instead of trivial ones. This reliance on Python has expanded even more so with the Internet gaining popularity. Today, a large majority of web applications and platforms rely on Python, including Google's search engine, Instagram, and the web-oriented transaction system of the New York Stock Exchange (NYSE). Even NASA utilises Python when to program their equipment and space machinery.
orto signify exatly what you would expect. Its use of indentation instead of brackets also makes it less prone to frustrating errors that simply come up because you used the wrong type of bracket in the wrong place.
Backtestinglibrary. By integrating this library into your code, you will be able to use code that someone else wrote to see how your investment strategy would have performed in the past, a standard way to check the validity of investment strategies. This way, instead of painstakingly coding everything yourself, you might be making money already!
"hello world"or whether it is a number such as an integer like
42) but that Python will figure it out by the time your code runs. While this makes your life much easier by removing another thing to think about, it can lead to errors in the code which you will have to deal with when you run it.
It is important to consider that Python may not be the ideal language to use in all situations. Although one of the most versatile, other languages offer features to address certain types of problems better than others.
Python runs relatively slower than Java. However, due to the program's built-in high-level data types and dynamic typing, Python takes much less time to develop. Typically, Python programs are 3-5 times shorter than equivalent Java programs.
C++ is originated from C language and provides the feature of compilation. It is similar to Java in terms running speed. However, C++ codes tend to be 5-10 times longer than that of Python.
When installing Python you can download the latest version of Python independently or as part of a distributiuon like Anaconda. The benefits of installing a Python distribution like Anaconda includes the reduced risk of messing up the required system libraries and a access to a wide variety of pre-installed open-source packages. Both options are listed below.
Open your browser and follow this link to the anaconda download page and download the version recommended for your system (Linux, Osx, Windows). Once installed Anaconda is ready to use.
Python can be obtained from the Python Software Foundation website at python.org. You will need to download the relevant installer for your operating system and running it on your machine.
While Macs come with Python 2.7 pre-installed and you could just use that, we recommend you install the newest version of Python as follows:
Once you have Python and an IDE installed, you are ready to begin coding! To do so, you would need to launch Python and the IDE first. As mentioned before, some IDEs are capable of running the code within the IDE itself. If your IDE is not capable of this, you would need to run your code on the command line or terminal.
You should find that the latest version of Python you have installed has been started.
You should find that the latest version of Python you have installed has been started.
As you start writing your first lines of code, you will inevitably make mistakes. This, however, is totally normal and happens to everyone all the time - even to the most experienced programmers. Even if those encrypted-looking, frightening red error messages seem annoying and unsatisfactory, it is of most importance to know how to deal with them. Therefore, try to look at your mistakes as the best way to improve your skills and your Python-knowledge and most importantly: don’t panic! This being said, we will now focus on how to interpret an error message and where to look for help efficiently.
How to read an error message
In order to illustrate how to read and deal with an error message, let's best look at an example. Consider the following simple code:
alpha = 10 beta = 20 gamma = alpha + beta print(Gamma)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-5f651ca246b5> in <module> 3 gamma = alpha + beta 4 ----> 5 print(Gamma) NameError: name 'Gamma' is not defined
In this example, we assigned the integer
10 to a variable called
alpha and the integer
20 to another variable called
beta. Then we assigned the result of the sum of
beta to a variable called
gamma. To show the result of the simple calculation, we used the built-in
As you can see, an error occurs. Let us show step by step how to deal with this specific error message.
We recommend that you start analyzing the error message from bottom to top, as the black arrow suggests. This is due to the following reason (inspired by https://realpython.com/python-traceback/):
What appears first is the name of the error, highlighted yellow in the upper left corner. This gives you a first impression of what went wrong: in this particular case we are obviously dealing with a
NameError. However, the second yellow box at the bottom of the code contains exactly the same information, but in more detail. It further tells us that we forgot to define a variable called
Gamma. As can be seen, the latter contains additional information, which suggests that we read this one first.
Next we consider the green area, which the arrow points to. This way, the traceback automatically locates the error in the code. In this case, the
NameError appeared in the fifth line.
The rest of the error message gives further information on the file name, module name etc. It simply specifies where to find a code, but this part of the error message is rather negligible for our purpose.
With the given information, we can conclude that the error occured because we tried to print out a variable called
Gamma, which isn't recognised by Python (because Python is a case-sensitive programming language, i.e., it clearly differetiates between lowercase and uppercase letters, as you will learn later on in another tutorial).
A more general way to analyze an error, detached from a specific example
Since the above example was rather easy, you might ask yourself how to interpret and use such a traceback in another setting, i.e., for another, more complex code. This clearly motivates the following outcome-oriented approach:
The first four steps became clear with the above example. However, the fifth and maybe most important step - fixing your code to make it work eventually - may cause various difficulties. That's why knowing where to look for help is an aboslutely crucial part of programming. In this respect, we may consider ourselves lucky to be programming in Python, since it has a huge community willing to help you. Most problems which you will be confronted with will very likely alredy have been asked and answered by another programmer.
The arguably most popular forum is called "Stack Overflow", which is a great way to look for answers. Just try googling "How to do XYZ in Python" and you will most likely be directed to a Stack Overflow post. One of the many useful gadgets of Google Colab is the fact that you can search for an answer to your own error message in a blink of a second - just click on the button "Search Stack Overflow" at the end of a traceback (check the screenshot above).
It is essential that you try not to ignore error messages, since this is the only way to really improve in coding and get a broader understanding of how the computer reacts to certain inputs. Additionaly, even though it may seem to slow you down in the beginning, it is definitely the most time efficient way to deal with a problem while coding: as a beginner, you probably won't find your mistake without some external help.
The next important step in preparing your machine to begin coding is to download an integrated development environment (IDE). IDEs serve as text editors, dedicated to creating an easier environment to design, write, organize and share your code. When choosing which IDE is right for you it is important to consider the following features:
Spyder is an open-source IDE included in the Anaconda Python Distribution. Spyder primarily targets data scientists and is specifically desgined for Python use.
Visual Studio Code is a text editor built on Electron. It is a light weight IDE which can be configured to work on almost any task, compatible with almost every language. It is also highly integrated with Git and Githib.
PyCharm is one of the most popular and powerful IDEs out there. It includes rich features such as code analysis, machine learning enhanced code completion, Git integration, web development tools (not included in the free version), and more. If you are planning on developing large scale projects with Python, PyCharm might be the IDE for you.
Jupyter is also included in the Anaconda Distribution and is a versatile tool for programming. Unlike the others, Jupyter is not a conventional IDE as users are able to document their work on it. However, Jupyter is not recomended for the sole use of writing complicated or extensive code. More information about Jupyter Notebook will be elaborated on in the subsequent chapters.
Jupyter is a platform which enables its users to display their programs in plain text while simultaneously sharing their original codes as well. In recent years, Jupyter Notebook has become increasingly popular in the scientific community due to its efficacy in combining scientific results with interactivle code in a plethora of programming langauges. Jupyter can also be used indepedently as an IDE to develop, create, and run your code, making it an increadibly useful and versatile tool.
Jupyter comes preinstalled with the Anaconda Distribution used for Python. However, if one has already downloaded Python independently of Anaconda, there are alternative ways to install Jupyter. You first have to check that you have the latest verion of pip installed. You may do so by executing the command
pip install --upgrade pip. If you find that the latest version of pip is not intalled on your device, please follow the insructions on this link. Next use the command
pip install jupyter to install Jupyter Notebook on to your device.
For Windows users, launching the Jupyter Notebook can be done easily through the Anaconda application which may be accessed via the start menu. As for OsX and Linux users, the terminal window or command line have to be opened first. Navigate through to the files you want to launch in the Jupyter Notebook (using the aforementioned commands in the introduction), then enter the command:
jupyter notebook. This will open the selected folders in Jupyter Notebook's online application in your browser. From this point you can either open your existing
.ipynb files or create a new notebook which will look similar to the figure below.
Once you have created a new notebook (or open a previously existing one), you can begin by finding the
Help tab in the menu bar and selecting the
user interface tour, which takes you through an overview of the features of Jupyter's user interface (UI). Some important featues include the cell function. Cells are containers for text or code to be displayed or executed by the notebook's kernel.
Markdown cells are used for writing text, creating tables and inserting images. These cells are written in the Markdown code. A brief introduction of what Markdown is would be expounded in the subsequent section.
Coding cells is similar to an IDE, in that you can use them to create and run your code in the notebook. To do so, write the code in a cell like the example below. To run it, select the desired cell and use either the shortcut
shift + enter or hit the
run button in the
cell tab at the top of the page. Try running the python code in the shell below!
GitHub is a collaborative code hosting site built on top of the git distributed version control system (DVCS) (refer to Chapter 4.2 for more a more detailed explanation on DVCS). GitHub reposes on a “fork & pull” model in which developers create their own copy of a repository that they then submit via a pull request. With the pull request, developers want the project master to pull their changes into the main branch.
In addition to code hosting, collaborative code reviewing, and integrated issue tracking, GitHub has integrated social features as well. Users are able to subscribe to information by “watching” projects and “following” users. Some users can award stars to codes belonging to other users, which essentially has the same effect as "liking" a post on Facebook. Users also have profiles, that can be populated with their personal information, and contain their recent activity on the site. With over 57 million repositories hosted, GitHub is currently the largest code hosting site in the world.
People might need to collaborate with developers on other systems. Version control systems are one way to do it. Version Control Systems record changes to a file over time so that that it is possible to recall specific versions later. In other words, version control systems allow one to revert selected files or even the entire project back to a previous state. Version Control Systems also allow one to compare changes over time, see who last modified something, who introduced an issue and when. We typically make a distinction between centralised and distributed version control systems.
In centralized version control systems, each user typically gets his or her own working copy, but there is only one central repository, often located on remote server. As soon as one commits, it is possible for the other developers to update and to see the changes. To check who made the changes and what the change were, users need to update the centralized server after executing a commit. The centralized server contains all the versioned files and number of developers that checked out files from that central place.
However, there are downsides to this. Firstly, the most obvious problem is the single point of failure that the centralized server represents. If the server goes down, during that duration nobody can collaborate or save changes. Secondly, another major issue is the hard disk. If one has the entire history saved in one local folder, he risks to lose everthing if the system fails or crashes.
In distributed version control systems, users get their own repository and working copy. Unlike a centralized version control system where working on a single server presents a major risk for the project development, distributed version systems stores in each users' local repository the full history of the file. Thus, if any server fails, edited repositories may be copied and restored back on the server.
To view modifications in a file, there are 4 steps one needs to execute. First, you will need to make a commit. At this stage, others still have no access to the changes made until you push your changes to the central repository. When you make the update, do not get others' changes unless you have first pulled those changes into your repository. Since the system is distributed, e.g. each developer gets their own local repository, nearly every operation can be done offline at incredible speed. This means that you can do commits, branches, merges, etc. file annotation entirely offline and generally instantly.
|Users can work productively when not connected to a network||Yes||No|
|Common operations such as commits, reverting changes, etc. are faster||Yes||No|
|Users can use the changes they do not want to publish||Yes||No|
|Initial checkout of a repository is slower (since all branches have to be copied in each local repository)||No||Yes|
|Additional storage required for every user to have a complete copy of the codebase history||Yes||No|
|Working copies are effectively remote backups||Yes||No|
|Various development models can be used||Yes||No|
|Common operations such as commits, reverting changes, etc. are faster||Yes||No|
Installing Git is not complicated. Access the homepage and look for the rubrique "Download". You then have to select the OS you are working in.
We previously introduced Git as a distributed version control system. This means that it allows users to efficiently collaborate on a certain project. It is also able to perform actions extremely fast as Git only needs to access the hard drive. With all this information, you may still wonder what the expectations of Git developers were when developing this platform. We list 5 important expectations:
Having a tool capable to rapidly take account of modifications in the file makes the collaboration easier. Compared to other systems, Git is often praised for its speed. The major difference between Git and any other VCS is the way Git thinks about data. Most systems view data of a set of files and changes made to each file. However, Git thinks about information as snapshot. When a developer changes a file, Git does not store again the file. Rather, it looks up the file stored in your computer and compares it with an updated file. The difference between the old and new file is the change. By then, Git does not have to ask a remote server to do it what drastically increases its speed.
For many Git beginners, Git is a difficult to apprehend. And in fact, it is, especially if you are a windows user since Git provides its best support for Linux, then Mac. You will have to learn and understand a lof of new notions and definitions. However, after doing this, you will probably have a better understanding of Git functions and Git mechanisms. A basic knowledge of git functions is also requiered. The command syntax is also complex and sometimes unusual names. However, once you have mastered it, you should realise that Git is a quite user-friendly program that allows you to efficiently structure your project.
A central feature of Git is branching. In Git, you can create a new local branch for everything you work on. The new local branch is a minor branch that is connected to the mainline, aka master branch. For each feature, each idea or bugfix,you can easily create a new branch, do a few commits on that branch and then merge it into your master branch or throw it away. You don’t have to mess up the master branch just to save or test your experimental ideas.
In this context, fully distributed means that every developer has their own repository that has the entire commit history of the project. A central property in distributed version control systems.
Git has some extensive functions to deal with large repositories with a very long history. Two solutions to deal with large repositories are presented by the Atlassian blog:
It’s impossible to change the contents of any file or directory without Git detecting it. If a file is lost or information has been lost in a file, if a file get corrupted or if any change has happened, Git is able to detect it. This is due to the fact that Git every information in Git has a correspoding hash value.
When you do actions in Git, nearly all of them only add data to the Git database. After you commit a snapshot into Git, it is very difficult to lose the information.
You will ocassionaly hear about "the three stages" in Git. This simply refers to the possible stages of the file, e.g. commited, modified or staged.
A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.
Your hello world repository can be a place where you store ideas, resources, or even share and discuss things with others.
To create a new repository:
Branching is the way to work on different versions of a repository at one time.
By default your repository has one branch named master which is considered to be the definitive branch. We use branches to experiment and make edits before committing them to master.
When you create a branch off the master branch, you’re making a copy, or snapshot, of master as it was at that point in time. If someone else made changes to the master branch while you were working on your branch, you could pull in those updates.
This diagram shows:
Have you ever saved different versions of a file? For instance:
Branches accomplish similar goals in GitHub repositories.
During a programming course, like this one, you can use branches for keeping bug fixes (improving/repairing code) and feature work (building new functions to an application) separate from our master (production, which contains all accepted side branches/forks) branch. When a change is ready, they merge their branch into master.
“Enter”on your keyboard.
Now you have two branches, master and readme-edits. They look exactly the same, but not for long! Next we’ll add our changes to the new branch.
Bravo! Now, you’re on the code view for your readme-edits branch, which is a copy of master. Let’s make some edits.
On GitHub, saved changes are called commits. Each commit has an associated commit message, which is a description explaining why a particular change was made. Commit messages capture the history of your changes, so other contributors can understand what you’ve done and why.
Make and commit changes
These changes will be made to just the README file on your readme-editsbranch, so now this branch contains content that’s different from master.
Nice edits! Now that you have changes in a branch off of master, you can open a pull request.
Pull Requests are the heart of collaboration on GitHub. When you open a pull request, you are requesting that the original author review your proposed changes and pull in your contribution and merge them into their branch. Pull requests show the differences between the content from both branches. The changes, additions, and subtractions are shown in green and red. As soon as you make a commit, you can open a pull request and start a discussion, even before the code is finished.
By using GitHub’s @mention system in your pull request message, you can ask for feedback from specific people or teams, whether they’re down the hall or 10 time zones away.
You can even open pull requests in your own repository and merge them yourself. It’s a great way to learn the GitHub flow before working on larger projects.
Open a Pull Request for changes to the README