#!/usr/bin/env python # coding: utf-8 # # # # # # # #

Introduction to Python for Data Sciences

Franck Iutzeler
# # #

# #
Chap. 1 - The Basics
# #

# # # # 0 - Installation and Quick Start # [Python](https://fr.wikipedia.org/wiki/Python_(langage)) is a programming language that is widely spread nowadays. It is used in many different domains thanks to its versatility. # # # It is an interpreted language meaning that the code is not compiled but *translated* by a running Python engine. # ## Installation # # See https://www.python.org/about/gettingstarted/ for how to install Python (but it is probably already installed). # # # In Data Science, it is common to use [Anaconda](https://www.anaconda.com/products/individual-d) to download and install Python and its environment. (see also the [quickstart](https://docs.anaconda.com/anaconda/user-guide/getting-started/). # # ## Writing Code # # Several options exists, more ore less user-friendly. # ### In the python shell # # # The python shell can be launched by typing the command `python` in a terminal (this works on Linux, Mac, and Windows with PowerShell). To exit it, type `exit()`. # # # # *Warning:* Python (version 2.x) and Python3 (version 3.x) coexists in some systems as two different softwares. The differences appear small but are real, and Python 2 is no longer supported, to be sure to have Python 3, you can type `python3`. # # # From the shell, you can enter Python code that will be executed on the run as you press Enter. As long as you are in the same shell, you keep your variables, but as soon as you exit it, everything is lost. It might not be the best option... # # # ### From a file # # # You can write your code in a file and then execute it with Python. The extension of Python files is typically `.py`. # # If you create a file `test.py` (using any text editor) containing the following code: # # --- # ~~~ # a = 10 # a = a + 7 # # print(a) # ~~~ # --- # # Then, you can run it using the command `python test.py` in a terminal from the *same folder* as the file. # # # # # This is a conveniant solution to run some code but it is probably not the best way to code. # ### Using an integrated development environment (IDE) # # # You can edit you Python code files with IDEs that offer debuggers, syntax checking, etc. Two popular exemples are: # * [Spyder](https://www.spyder-ide.org/) which is quite similar to MATLAB or RStudio # # * [VS Code](https://code.visualstudio.com/) which has a very good Python integration while not being restricted to it. # # # # ### Jupyter notebooks # # # [Jupyter notebooks](https://jupyter.org/) are browser-based notebooks for Julia, Python, and R, they correspond to `.ipynb` files. The main features of jupyter notebooks are: # * In-browser editing for code, with automatic syntax highlighting, indentation, and tab completion/introspection. # * The ability to execute code from the browser and plot inline. # * In-browser editing for rich text using the Markdown markup language. # * The ability to include mathematical notation within markdown cells using LaTeX, and rendered natively by MathJax. # # # #### Installation # # In a terminal, enter `python -m pip install notebook` or simply `pip install notebook` # # *Note :* Anaconda directly comes with notebooks, they can be lauched from the Navigator directly. # # # #### Use # # To lauch Jupyter, enter `jupyter notebook`. # # This starts a *kernel* (a process that runs and interfaces the notebook content with an (i)Python shell) and opens a tab in the *browser*. The whole interface of Jupyter notebook is *web-based* and can be accessed at the address http://localhost:8888 . # # Then, you can either create a new notebook or open a notebooks (`.ipynb` file) of the current folder. # # *Note :* Closing the tab *does not terminate* the notebook, it can still be accessed at the above adress. To terminate it, use the interface (File -> Close and Halt) or in the kernel terminal type `Ctrl+C`. # # # #### Remote notebook exectution # # Without any installation, you can: # * *view* notebooks using [NBViewer](https://nbviewer.jupyter.org/) # * *fully interact* with notebooks (create/modify/run) using [UGA's Jupyter hub](https://jupyterhub.u-ga.fr/), [Binder](https://mybinder.org/) or [Google Colab](https://colab.research.google.com/) # # #### Interface # # Notebook documents contains the inputs and outputs of an interactive python shell as well as additional text that accompanies the code but is not meant for execution. In this way, notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and representations of resulting objects. These documents are saved with the `.ipynb` extension. # # Notebooks may be exported to a range of static formats, including HTML (for example, for blog posts), LaTeX, PDF, etc. by `File->Download as` # # # # ##### Accessing notebooks # You can open a notebook by the file explorer from the *Home* (welcome) tab or using `File->Open` from an opened notebook. To create a new notebook use the `New` button top-right of *Home* (welcome) tab or using `File->New Notebook` from an opened notebook, the programming language will be asked. # # ##### Editing notebooks # You can modify the title (that is the file name) by clicking on it next to the jupyter logo. # The notebooks are a succession of *cells*, that can be of four types: # * `code` for python code (as in ipython) # * `markdown` for text in Markdown formatting (see this [Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)). You may additionally use HTML and Latex math formulas. # * `raw` and `heading` are less used for raw text and titles # # ##### Cells # You can *edit* a cell by double-clicking on it. # You can *run* a cell by using the menu or typing `Ctrl+Enter` (You can also run all cells, all cells above a certain point). It if is a text cell, it will be formatted. If it is a code cell it will run as it was entered in a ipython shell, which means all previous actions, functions, variables defined, are persistent. To get a clean slate, your have to *restart the kernel* by using `Kernel->Restart`. # # ##### Useful commands # # * `Tab` autocompletes # * `Shift+Tab` gives the docstring of the input function # * `?` return the help # # # # 1- Numbers and Variables # # # ## Variables # # In[1]: 2 + 2 + 1 # comment # In[2]: a = 4 print(a) print(type(a)) # In[3]: a,x = 4, 9000 print(a) print(x) # Variables names can contain `a-z`, `A-Z`, `0-9` and some special character as `_` but must always begin by a letter. By convention, variables names are smallcase. # # # ## Types # # Variables are *weakly typed* in python which means that their type is deduced from the context: the initialization or the types of the variables used for its computation. Observe the following example. # In[4]: print("Integer") a = 3 print(a,type(a)) print("\nFloat") b = 3.14 print(b,type(b)) print("\nComplex") c = 3.14 + 2j print(c,type(c)) print(c.real,type(c.real)) print(c.imag,type(c.imag)) # This typing can lead to some variable having unwanted types, which can be resolved by *casting* # In[5]: d = 1j*1j print(d,type(d)) d = d.real print(d,type(d)) d = int(d) print(d,type(d)) # In[6]: e = 10/3 print(e,type(e)) f = (10/3)/(10/3) print(f,type(f)) f = int((10/3)/(10/3)) print(f,type(f)) # ## Operation on numbers # # The usual operations are # * Multiplication and Division with respecively `*` and `/` # * Exponent with `**` # * Modulo with `%` # In[7]: print(7 * 3., type(7 * 3.)) # int x float -> float # In[8]: print(3/2, type(3/2)) # Warning: int in Python 2, float in Python 3 print(3/2., type(3/2.)) # To be sure # In[9]: print(2**10, type(2**10)) # In[10]: print(8%2, type(8%2)) # ## Booleans # # Boolean is the type of a variable `True` or `False` and thus are extremely useful when coding. # * They can be obtained by comparisons `>`, `>=` (greater, greater or égal), `<`, `<=` (smaller, smaller or equal) or membership `==` , `!=` (equality, different). # * They can be manipulated by the logical operations `and`, `not`, `or`. # In[11]: print('2 > 1\t', 2 > 1) print('2 > 2\t', 2 > 2) print('2 >= 2\t',2 >= 2) print('2 == 2\t',2 == 2) print('2 == 2.0',2 == 2.0) print('2 != 1.9',2 != 1.9) # In[12]: print(True and False) print(True or True) print(not False) # ## Lists # # Lists are the base element for sequences of variables in python, they are themselves a variable type. # * The syntax to write them is `[ ... , ... ]` # * The types of the elements may not be all the same # * The indices begin at $0$ (`l[0]` is the first element of `l`) # * Lists can be nested (lists of lists of ...) # # # *Warning:* Another type called *tuple* with the syntax `( ... , ... )` exists in Python. It has almost the same structure than list to the notable exceptions that one cannot add or remove elements from a tuple. We will see them briefly later # In[13]: l = [1, 2, 3, [4,8] , True , 2.3] print(l, type(l)) # In[14]: print(l[0],type(l[0])) print(l[3],type(l[3])) print(l[3][1],type(l[3][1])) # In[15]: print(l) print(l[4:]) # l[4:] is l from the position 4 (included) print(l[:5]) # l[:5] is l up to position 5 (excluded) print(l[4:5]) # l[4:5] is l between 4 (included) and 5 (excluded) so just 4 print(l[1:6:2]) # l[1:6:2] is l between 1 (included) and 6 (excluded) by steps of 2 thus 1,3,5 print(l[::-1]) # reversed order print(l[-1]) # last element # ### Operations on lists # # One can add, insert, remove, count, or test if a element is in a list easily # In[16]: l.append(10) # Add an element to l (the list is not copied, it is actually l that is modified) print(l) # In[17]: l.insert(1,'u') # Insert an element at position 1 in l (the list is not copied, it is actually l that is modified) print(l) # In[18]: l.remove(10) # Remove the first element 10 of l print(l) # In[19]: print(len(l)) # length of a list print(2 in l) # test if 2 is in l # ### Handling lists # # Lists are *pointer*-like types. Meaning that if you write `l2=l`, you *do not copy* `l` to `l2` but rather copy the pointer so modifying one, will modify the other. # # The proper way to copy list is to use the dedicated `copy` method for list variables. # In[20]: l2 = l l.append('Something') print(l,l2) # In[21]: l3 = list(l) # l.copy() works in Python 3 l.remove('Something') print(l,l3) # You can have void lists and concatenate list by simply using the + operator, or even repeat them with * . # In[22]: l4 = [] l5 =[4,8,10.9865] print(l+l4+l5) print(l5*3) # ## Tuples, Dictionaries [*] # # # * Tuples are similar to list but are created with `(...,...)` or simply comas. They cannot be changed once created. # # In[23]: t = (1,'b',876876.908) print(t,type(t)) print(t[0]) # In[24]: a,b = 12,[987,98987] u = a,b print(a,b,u) # In[25]: try: u[1] = 2 except Exception as error: print(error) # * Dictionaries are aimed at storing values of the form *key-value* with the syntax `{key1 : value1, ...}` # # This type is often used as a return type in librairies. # In[26]: d = {"param1" : 1.0, "param2" : True, "param3" : "red"} print(d,type(d)) # In[27]: print(d["param1"]) d["param1"] = 2.4 print(d) # ## Strings and text formatting # # # * Strings are delimited with (double) quotes. They can be handled globally the same way as lists (see above). # * print displays (tuples of) variables (not necessarily strings). # * To include variable into string, it is preferable to use the format method. # # *Warning:* text formatting and notably the `print` method is one of the major differences between Python 2 and Python 3. The method presented here is clean and works in both versions. # In[28]: s = "test" print(s,type(s)) # In[29]: print(s[0]) print(s + "42") # In[30]: print(s,42) print(s+"42") # In[31]: try: print(s+42) except Exception as error: print(error) # The `format` method # In[32]: print( "test {}".format(42) ) # In[33]: print( "test with an int {:d}, a float {} (or {:e} which is roughly {:.1f})".format(4 , 3.141 , 3.141 , 3.141 )) # # 2- Branching and Loops # # # ## If, Elif, Else # # In Python, the formulation for branching is the `if:` condition (mind the `:`) followed by an indentation of *one tab* that represents what is executed if the condition is true. **The indentation is primordial and at the core of Python.** # # # In[34]: statement1 = False statement2 = False if statement1: print("statement1 is True") elif statement2: print("statement2 is True") else: print("statement1 and statement2 are False") # In[35]: statement1 = statement2 = True if statement1: if statement2: print("both statement1 and statement2 are True") # In[36]: if statement1: if statement2: # Bad indentation! #print("both statement1 and statement2 are True") # Uncommenting Would cause an error print("here it is ok") print("after the previous line, here also") # In[37]: statement1 = True if statement1: print("printed if statement1 is True") print("still inside the if block") # In[38]: statement1 = False if statement1: print("printed if statement1 is True") print("outside the if block") # ## For loop # # The syntax of `for` loops is `for x in something:` followed by an indentation of one tab which represents what will be executed. # # The `something` above can be of different nature: list, dictionary, etc. # In[39]: for x in [1, 2, 3]: print(x) # In[40]: sentence = "" for word in ["Python", "for", "data", "Science"]: sentence = sentence + word + " " print(sentence) # A useful function is range which generated sequences of numbers that can be used in loops. # In[41]: print("Range (from 0) to 4 (excluded) ") for x in range(4): print(x) print("Range from 2 (included) to 6 (excluded) ") for x in range(2,6): print(x) print("Range from 1 (included) to 12 (excluded) by steps of 3 ") for x in range(1,12,3): print(x) # If the index is needed along with the value, the function `enumerate` is useful. # In[42]: for idx, x in enumerate(range(-3,3)): print(idx, x) # ## While loop # # Similarly to `for` loops, the syntax is`while condition:` followed by an indentation of one tab which represents what will be executed. # In[43]: i = 0 while i<5: print(i) i+=1 # ## Try [*] # # When a command may fail, you can `try` to execute it and optionally catch the `Exception` (i.e. the error). # # # In[44]: a = [1,2,3] print(a) try: a[1] = 3 print("command ok") except Exception as error: print(error) print(a) # The command went through try: a[6] = 3 print("command ok") except Exception as error: print(error) print(a) # The command failed # # 3- Functions # # # In Python, a function is defined as `def function_name(function_arguments):` followed by an indentation representing what is inside the function. (No return arguments are provided a priori) # # # In[45]: def fun0(): print("\"fun0\" just prints") fun0() # Docstring can be added to document the function, which will appear when calling `help` # In[46]: def fun1(l): """ Prints a list and its length """ print(l, " is of length ", len(l)) fun1([1,'iuoiu',True]) # In[47]: help(fun1) # ## Outputs # # `return` outputs a variable, tuple, dictionary, ... # In[48]: def square(x): """ Return x squared. """ return(x ** 2) help(square) res = square(12) print(res) # In[49]: def powers(x): """ Return the first powers of x. """ return(x ** 2, x ** 3, x ** 4) help(powers) # In[50]: res = powers(12) print(res, type(res)) # In[51]: two,three,four = powers(3) print(three,type(three)) # In[52]: def powers_dict(x): """ Return the first powers of x as a dictionary. """ return{"two": x ** 2, "three": x ** 3, "four": x ** 4} res = powers_dict(12) print(res, type(res)) print(res["two"],type(res["two"])) # ## Arguments # # It is possible to # * Give the arguments in any order provided that you write the corresponding argument variable name # * Set defaults values to variables so that they become optional # In[53]: def fancy_power(x, p=2, debug=False): """ Here is a fancy version of power that computes the square of the argument or other powers if p is set """ if debug: print( "\"fancy_power\" is called with x =", x, " and p =", p) return(x**p) # In[54]: print(fancy_power(5)) print(fancy_power(5,p=3)) # In[55]: res = fancy_power(p=8,x=2,debug=True) print(res) # # 4- Classes [*] # # # Classes are at the core of *object-oriented* programming, they are used to represent an object with related *attribues* (variables) and *methods* (functions). # # They are defined as functions but with the keyword class `class my_class(object):` followed by an indentation. The definition of a class usually contains some methods: # * The first argument of a method must be `self` in auto-reference. # * Some method names have a specific meaning: # * `__init__`: method executed at the creation of the object # * `__str__` : method executed to represent the object as a string for instance when the object is passed ot the function `print` # # In[56]: class Point(object): """ Class of a point in the 2D plane. """ def __init__(self, x=0.0, y=0.0): """ Creation of a new point at position (x, y). """ self.x = x self.y = y def translate(self, dx, dy): """ Translate the point by (dx , dy). """ self.x += dx self.y += dy def __str__(self): return("Point: ({:.2f}, {:.2f})".format(self.x, self.y)) # In[57]: p1 = Point() print(p1) p1.translate(3,2) print(p1) p2 = Point(1.2,3) print(p2) # # 5- Reading and writing files # # # # `open` returns a file object, and is most commonly used with two arguments: `open(filename, mode)`. # # The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used (optional, 'r' will be assumed if it’s omitted.): # * 'r' when the file will only be read # * 'w' for only writing (an existing file with the same name will be erased) # * 'a' opens the file for appending; any data written to the file is automatically added to the end # In[58]: f = open('./data/test.txt', 'w') print(f) # `f.write(string)` writes the contents of string to the file. # In[59]: f.write("This is a test\n") # In[60]: f.close() # *Warning:* For the file to be actually written and being able to be opened and modified again without mistakes, it is primordial to close the file handle with `f.close()` # # # `f.read()` will read an entire file and put the pointer at the end. # In[61]: f = open('./data/text.txt', 'r') f.read() # In[62]: f.read() # The end of the file has be reached so the command returns ''. # # To get to the top, use `f.seek(offset, from_what)`. The position is computed from adding `offset` to a reference point; the reference point is selected by the `from_what` argument. A `from_what` value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point. Thus `f.seek(0)` goes to the top. # In[63]: f.seek(0) # `f.readline()` reads a single line from the file; a newline character (\n) is left at the end of the string # In[64]: f.readline() # In[65]: f.readline() # For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code: # In[66]: f.seek(0) for line in f: print(line) f.close() # # 6- Exercises # # # # > **Exercise 1:** Odd or Even # > # > The code snippet below enable the user to enter a number. Check if this number is odd or even. Optionnaly, handle bad inputs (character, float, signs, etc) # # In[67]: num = input("Enter a number: ") print(num) # In[ ]: # --- # > **Exercise 2:** Fibonacci # > # > The Fibonacci seqence is a sequence of numbers where the next number in the sequence is the sum of the previous two numbers in the sequence. The sequence looks like this: 1, 1, 2, 3, 5, 8, 13. Write a function that generate a given number of elements of the Fibonacci sequence. # # In[ ]: # --- # # > **Exercise 3:** Implement *quicksort* # > # > The [wikipedia page](http://en.wikipedia.org/wiki/Quicksort) describing this sorting algorithm gives the following pseudocode: # # function quicksort('array') # if length('array') <= 1 # return 'array' # select and remove a pivot value 'pivot' from 'array' # create empty lists 'less' and 'greater' # for each 'x' in 'array' # if 'x' <= 'pivot' then append 'x' to 'less' # else append 'x' to 'greater' # return concatenate(quicksort('less'), 'pivot', quicksort('greater')) # # > Create a function that sorts a list using quicksort # # In[68]: def quicksort(l): # ... return None res = quicksort([-2, 3, 5, 1, 3]) print(res) # --- # > **Exercise 4:** Project Euler # > # > [Project Euler](https://projecteuler.net/) is a website of competitive programming mainly based on solving cleverly otherwise computation-intensive mathematical problems. It is a good way to learn a new scientific programming language. # > [Problem 1](https://projecteuler.net/problem=1) reads # # # If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. # # Find the sum of all the multiples of 3 or 5 below 1000. # # > Write a script that solves this problem. # # # In[ ]: # # You can continue by solving other [problems](https://projecteuler.net/archives). The first ones (eg. 4, 31) are the easiest. # # # In[ ]: # ---