Cut material from Python tutorial¶

Imported stuff from python_tutorial.ipynb.

In [ ]:

Python for data science and statistics¶

Python is a popular language for data analysis because of the numerous functions it provides for data management, data visualization, and statistics.

Learning to use these Python functions will y

Learning a few basic Python constructs like the for loop will enable you to simulate probability distributions and experimentally verify how statistics procedures work. This is a really big deal! If's good to know the statistical formula and recipes, but it's even better when you can run your own simulations and check when the formulas work and when they fail.

Once you learn the basics of Python syntax, you'll have access to the best-in-class tools for data management (Pandas, see pandas_tutorial.ipynb), data visualization (Seaborn, see seaborn_tutorial.ipynb), statistics (scipy and statsmodels).

Don't worry there won't be any advanced math—just sums, products, exponents, logs, and square roots. Nothing fancy, I promise. If you've ever created a formula in a spreadsheet, then you're familiar with all the operations we'll see. In a spreadsheet formula you'd use SUM( in Python we write sum(. You see, it's nothing fancy.

Yes, there will be a lot of code (paragraphs of Python commands) in this tutorial, but you can totally handle this. If you ever start to freak out an think "OMG this is too complicated!" remember that Python is just a fancy calculator.

All in all, learning Python gives lots of tools to help you understand math and science topics.

take advantage of everything Python has to offer for data analysis and statistics.

You can run JupyterLab on your computer or run JupyterLab on a remote server using a binder link.

bit.ly/pytut3

Alternatives: If you don't want to install anything on your computer yet, you have two other options for playing with this notebook:

Run JupyterLab instance in the cloud via the binder link. Click here to launch an interactive notebook of this tutorial.
You can also enable the "Live Code" feature while reading this tutorial online at noBSstats.com. Use the rocket button in the top right, and choose the Live Code option to make all the cells in this notebook interactive.

The notebook interface offers many useful features, but for now, I just want you to think of notebooks as an easy way to run Python code.

Here is another example of an expression that uses the function len to compute the length of a list of numbers.

In [1]:

len([1, 2, 3])

Out[1]:

Here the function len received the list of numbers [1, 2, 3] as input, and produced the output 3 as output, which is the length of the list.

In [ ]:

To store the result of an expression as a variable, we use the assignment operator = as follows, from left to right:

we start by writing the name of the variable
then, we add the symbol = (which stands for assign to)
finally, we write an expression for the value we want to store in the variable

In [ ]:

Running the above code cell doesn't print anything, because we have only defined variables: score, average, scores, message, and above_the_average, but we didn't display them.

We'll use the generic variable name obj to refer to an object of any type.

In [ ]:

Notebooks allow us run Python commands interactively, which is the best way to learn! Try some Python commands to get a feeling of how code cells work.

Remember you can click the play button in the toolbar (the first button in box (4) in the screenshot) or use the keyboard shortcut SHIFT+ENTER to run the code.

I encourage you to play around with the notebook execution buttons in box (4).

In [ ]:

Review of math functions¶

The Python syntax for functions is inspired by the syntax used for math functions, so we'll start with a quick overview the concepts of a function in mathematics. The convention in math to call functions with single letters like $f$, $g$, $h$, etc. We call denote the function inputs as $x$ and its outputs as $y$.

We define the function $f$ by writing expression to compute for a given input $x$.

$$ y = f(x) = \text{some expression involving } x $$

For example $f(x) = 2x+3$.

Once we have defined the function $f$, we can evaluate the function for any possible input $x$. For example, the value of the function $f$ when $x=5$ is denoted $f(5)$ and is equal to $2(5) +3 = 10 + 3 = 13$. In other words, $f(5) = 13$.

Python functions¶

Functions in Python are similar to functions in math:

...

In the above example, we intentionally chose the function name, and the name of its input and output to highlight the connection with with the math function example we saw earlier.

plotnine another high-level library for data visualization base don the grammar of graphics principles
scikit-learn tools and algorithms for machine learning

In [ ]:

Overview of the material in this tutorial¶

We'll cover all essential topics required to get to know Python, including:

Getting started where we'll install JupyterLab Desktop coding environment
Expressions and variables: basic building blocks of any program.
Getting comfortable with Python: looking around and getting help.
Lists and for loops: repeating steps and procedures.
Functions are reusable code blocks.
Other data structures: sets, tuples, etc.
- Boolean variables and conditional statements: conditional code execution.
- Dictionaries are a versatile way to store data.
Objects and classes: creating custom objects.
Python grammar and syntax: review of all the syntax.
Python libraries and modules: learn why people say Python comes with "batteries included"

After you're done with this tutorial, you'll be ready to read the other two:

Pandas (see pandas_tutorial.ipynb)
Seaborn (see seaborn_tutorial.ipynb)

It's important for you to try solving the exercises that you'll encounter as you read along. The exercises are a great way to practice what you've been learning.

In [ ]:

In [2]:

## ALT. display both value and type on the same line (as a tuple)
# score, type(score)

Python is a "civilized" language

We'll now learn about some of these tools including, "doc strings" (help menus) and the different ways at learning

at what attributes and methods are available to use.

We'll now learn about some of these tools including,

Above all, Python has a culture of being beginner friendly so

This combination of tools allows programmers to answer common questions about Python objects and functions without leaving the JupyterLab environment. Basically, in Python all the necessary info is accessible directly in the coding environment. For example, at the end of this section you'll be able to answer the following questions on your own:

How many and what type of arguments does the function print expect?
What kind of optional, keyword arguments does the function print accept?
What attributes and methods does the Python object obj have?
What variables and functions are defined in the current namespace?

More than 50% of any programmer's time is spent looking at help info and trying to understand the variables, functions, objects, and methods they are working with, so it's important for you to learn these meta-skills.

In [ ]:

You can also add longer, multi-line comments using triple-quoted text.

In [3]:

"""
This is a longer comment,
which is written on two lines.
"""

Out[3]:

'\nThis is a longer comment,\nwhich is written on two lines.\n'

The doc-strings we talked about earlier, were created by this kind of multi-line strings included in the source code of the functions abs, len, sum, print, etc.

In [4]:

#### More exceptions

In [5]:

# ValueError
# int("zz")

# ZeroDivisionError
# 5/0

# KeyError
# d = {}
# d["zz"]

# ImportError
# from math import zz

# AttributeError
# "hello".zz

Exceptions¶

The computer doesn't like what you entered. The output is a big red box, that tells you your input was REJECTED!

if you type invalid syntax, assign to non-existing variables, or otherwise input something that Python doesn't like, Python will throw an "exception," which is like saying "Yo, I don't understand this, or I can't run this, or the code refers to some data that doesn't exist, etc." You'll see the name of the error that occurred and a message to explain what went wrong.

Example: Learning about the `abs` function¶

Let's say you're reading some Python code written by a friend, and they use the function abs in their code. Suppose you've never seen this function before, and you have no idea what it does.

In [6]:

# put cursor in the middle of function and press SHIFT+TAB
abs(-3)

Out[6]:

We can also obtain the same information using the help() function on abs.

In [7]:

help(abs)

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.

The help menu tells you that abs(x) is the absolute value function, which is written $|x|$ in math notation, and defined as

$$ |x| = \begin{cases} x & \text{if } x \geq 0 \\ -x & \text{if } x < 0 \end{cases} $$

We refer to the help menu associated with an object as its "doc string", since the information is stored as obj.__doc__.

In [8]:

abs.__doc__

Out[8]:

'Return the absolute value of the argument.'

In [ ]:

We've already used both type and print, so there is nothing new here. I just wanted to remind you you can always use these functions as first line of inspection.

In [9]:

obj = 3

print(obj)

In [10]:

type(obj)

Out[10]:

int

In [11]:

repr(obj)

Out[11]:

'3'

DICTS¶

Side note: You can think of a list as a special type of dictionary that has the integers 0, 1, 2, as keys. Alternatively, you can think of dictionaries as "fancy lists" that allow keys to be arbitrary instead of being limited to sequential integer indices.

Recall the general syntax of an assignment statement is as follows:

<place> = <some expression>

In the above examples, the <place> refers the the location inside the profile dictionary identified by a particular key. In the first example, we assigned the value 77 to the place profile["score"] which modified the value that was previously stored there. In the second example we assigned the value 42 to the new place profile["age"], so Python created it.

In [ ]:

In [12]:

True and True, True and False, False and True, False and False

Out[12]:

(True, False, False, False)

In [13]:

True or True, True or False, False or True, False or False

Out[13]:

(True, True, True, False)

Inline if statements (bonus topic)¶

We can also use if-else keywords to write conditional expressions on a single line. The general syntax for these is:

<value1> if <condition> else <value2>

This expressions evaluates to <value1> if <condition> is True, else it evaluates to <value2> when <condition> is False.

In [14]:

temp = 25
msg = "It's hot!" if temp > 22 else "It's OK."
msg

Out[14]:

"It's hot!"

In [ ]:

LISTS¶

Recall you can see a complete list of all the methods on list objects by typing scores. then pressing the TAB button to trigger the auto-complete suggestions. Uncomment the following code block, place your cursor after the dot, and try pressing TAB to see what happens.

In [15]:

# scores.

Exercise 12: The default behaviour of the method .sort() is to sort the elements in increasing order. Suppose you want sort the elements in decreasing order instead. You can pass a keyword argument to the method .sort() to request the sorting be done in "revese" order (decreasing instead of increasing). Consult the docstring of the .sort() method to find the name of the keyword argument that does this, then modify the code below to sort the elements of the list scores in decreasing order.

In [16]:

scores = [61, 79, 98, 72]

scores.sort()
scores

Out[16]:

[61, 72, 79, 98]

In [17]:

#@titlesolution Exercise 12 sorted-reverse
# help(scores.sort)
scores.sort(reverse=True)
scores

Out[17]:

[98, 79, 72, 61]

See what's in the global namespace (bonus)¶

In a Jupyter notebook, you can run the command %whos to print all the variables and functions that defined in the current namespace.

In [18]:

# %whos

In [ ]:

Exercise 7: Display the doc-string of the function sum.

In [19]:

#@titlesolution Exercise 7 help-sum
help(sum)

Help on built-in function sum in module builtins:

sum(iterable, /, start=0)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.

Exercise 8: Display the doc string of for the function print.

In [20]:

#@titlesolution Exercise 8 help-print
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

In [ ]:

Application: changing the separator when printing multiple values¶

We can choose a different separator between arguments of the print function by specifying the value for the keyword argument sep.

In [21]:

x = 3
y = 2.3
print(x, y)

3 2.3

In [22]:

print(x, y, sep=" --- ")

3 --- 2.3

In [ ]:

Question 2: Sample standard deviation¶

The formula for the sample standard seviation of a list of numbers is: $$ \text{std}(\textbf{x}) = s = \sqrt{ \tfrac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})^2 } = \sqrt{ \tfrac{1}{n-1}\left[ (x_1-\overline{x})^2 + (x_2-\overline{x})^2 + \cdots + (x_n-\overline{x})^2\right]}. $$

Note the division is by $(n-1)$ and not $n$. Strange, no? You'll have to wait until stats to see why this is the case.

Write compute_std(numbers): computes the sample standard deviation

In [23]:

import math

def mean(numbers):
    return sum(numbers)/len(numbers)

def std(numbers):
    """
    Computes the sample standard deviation (square root of the sample variance)
    using a for loop.
    """
    avg = mean(numbers) 
    total = 0
    for number in numbers:
        total = total + (number-avg)**2
    var = total/(len(numbers)-1)    
    return math.sqrt(var)

numbers = list(range(0,100))
std(numbers)

Out[23]:

29.011491975882016

In [24]:

# compare to known good function...
import statistics
statistics.stdev(numbers)

Out[24]:

29.011491975882016

In [ ]:

Functions! Finally we get to the good stuff!

Functions allow us to build chunks of reusable code that we can later reuse in other programs.

In [ ]:

In [25]:

# # ALT. using the `range` function
# numbers = range(1,6)
# squares = [n**2 for n in numbers]
# squares

In [ ]:

Sets¶

Python sets are a representation of the mathematical sets, that is, collections of elements.

In [26]:

s = set()
s

Out[26]:

set()

In [27]:

s.add(3)
s.add(5)
s

Out[27]:

{3, 5}

In [28]:

3 in s

Out[28]:

True

In [29]:

print("The set s contains the elements:")
for el in s:
    print(el)

The set s contains the elements:
3
5

Sets are sometimes useful when we want to keep track of which elements have been encountered, but don't care how many times.

In [30]:

s.add(3)
s.add(3)
s

Out[30]:

{3, 5}

In [ ]:

Tuples¶

Tuples are similar to lists but with less features.

In [31]:

2,3

Out[31]:

(2, 3)

In [32]:

(2,3)

Out[32]:

(2, 3)

We can use the tuples syntax to assign to multiple variables on a single line:

In [33]:

x, y = 3, 4

We can also use tuples to "swap" two values.

In [34]:

# Swap the contexts of the variables x and y
tmp = x
y = x
x = tmp

In [35]:

# Equivalent operation on one line
x, y = y, x

Defining new types of objects¶

Using the Python keyword class can be used to define new kinds of objects.

Let's create a custom class of objects Interval that represent intervals of real numbers like $[a,b] \subset \mathbb{R}$. We want to be able to use the new interval objects in if statements to check if a number $x$ is in the interval $[a,b]$ or not.

Recall the in operator that we can use to check if an element is part of a list

>>> 3 in [1,2,3,4]
True

we want the new objects of type Interval to test for membership.

Example usage:

>>> 3 in Interval(2,4)
True
>>> 5 in Interval(2,4)
False

The expression x in Y is corresponds to calling the method __contains__ on the container object Y: Y.__contains__(x) and it will return a boolean value (True or False).

If we want to support checks like 3 in Interval(2,4) we therefore have to implement the method __contains__ on the Interval class.

In [36]:

class Interval:
    """
    Object that embodies the mathematical concept of an interval.
    `Interval(a,b)` is equivalent to math interval [a,b] = {𝑥 | 𝑎 ≤ 𝑥 ≤ 𝑏}.
    """

    def __init__(self, lowerbound, upperbound):
        """
        This method is called when the object is created, and is used to
        set the object attributes from the arguments passed in.
        """
        self.lowerbound = lowerbound
        self.upperbound = upperbound

    def __str__(self):
        """
        Return a representation of the interval as a string like "[a,b]".
        """
        return "[" + str(self.lowerbound) + "," + str(self.upperbound) + "]"

    def __contains__(self, x):
        """
        This method is called to check membership using the `in` keyword.
        """
        return self.lowerbound <= x and x <= self.upperbound

    def __len__(self):
        """
        This method will get called when you call `len` on the object.
        """
        return self.upperbound - self.lowerbound

Create an object that corresponds to the interval $[2,4]$.

In [37]:

interval2to4 = Interval(2,4)
interval2to4

Out[37]:

<__main__.Interval at 0x1041639d0>

In [38]:

type(interval2to4)

Out[38]:

__main__.Interval

In [39]:

str(interval2to4)

Out[39]:

'[2,4]'

In [40]:

3.3 in interval2to4

Out[40]:

True

In [41]:

1 in interval2to4

Out[41]:

False

In [42]:

len(interval2to4)

Out[42]:

In [ ]:

Note for Windows users¶

If you're on macOS or Linux you can ignore this section—skip to the next section Data management with Pandas.

File paths on Windows use the backslash character (\) as path separator, while UNIX operating systems like Linux and macOS use forward slash separator / as path separator.

If you you're on Windows you'll need to manually edit the code examples below to make them work by replacing all occurrences of "/" with "\\". The double backslash is required to get a literal backslash because the character \ has special meaning as an escape character.

In [43]:

import os

if os.path.sep == "/":
    print("You're on a UNIX system (Linux or macOS).")
    print("Enjoy civilization!")
elif os.path.sep == "\\":
    print("You're on Windows so you should use \\ as path separator.")
    print("Replace any occurence of / (forward slash) in paths with \\\\ (double-backslash).")

You're on a UNIX system (Linux or macOS).
Enjoy civilization!

The current working directory is a path on your computer where this notebook is running. The code cell below shows you can get you current working directory.

In [44]:

os.getcwd()

Out[44]:

'/Users/ivan/Projects/Minireference/STATSbook/noBSstatsnotebooks/tutorials'

You're in the notebooks/ directory, which is inside the parent directory noBSstats/.

The datasets we'll be using in this notebook are located in the datasets/ directory, which is sibling of the notebooks/ directory, inside the parent noBSstats/. To access data file minimal.csv in the datasets/ directory from the current directory, we must specify a path that includes the .. directive (go to parent), then go into the datasets directory, then open the file minimal.csv.

This combination of "directions" for getting to the file will look different if you're on a Windows system or on a UNIX system. The code below shows the correct path you should access.

In [45]:

if os.path.sep == "/":
    # UNIX path separators
    path = "../datasets/players.csv"
else:
    # Windows path separators
    path = "..\\datasets\\players.csv"

print("The path to the file players.csv in the datasets/ directory is")
path

The path to the file players.csv in the datasets/ directory is

Out[45]:

'../datasets/players.csv'

All the code examples provided below assume you're on a UNIX system, hence the need to manually modify them to use double-backslashes in path strings for the code to work.

In [46]:

# ALT.
import os
import pandas as pd
filepath = os.path.join("..", "datasets", "players.csv")
players = pd.read_csv(filepath)

In [ ]:

Converting lists to strings¶

The code "".join(msgs) can be used to concatenate the a list of strings msgs.

In [47]:

msgs = ["Hello", "Hi", "Whatsup?"]
"".join(msgs)

Out[47]:

'HelloHiWhatsup?'

In [48]:

# join-together using once space " " as separator
" ".join(msgs)

Out[48]:

'Hello Hi Whatsup?'

In [ ]:

Example: file `open` and the `readlines()` method¶

TODO: create a file called story.txt in the current working directory, and write a few lines in it. The code examples below are based on this sample story.txt (use save as if you want to have the same file).

In [49]:

file = open("story.txt")
lines = file.readlines()
lines

Out[49]:

['This is a short story.\n',
 'It is very short.\n',
 'It has only four lines.\n',
 'It ends with the word cat.\n']

In [50]:

for line in lines:
    print(line)

This is a short story.

It is very short.

It has only four lines.

It ends with the word cat.

In [51]:

# we can pass a custom `end` keyword argument to avoid double newlines:
for line in lines:
    print(line, end="")

This is a short story.
It is very short.
It has only four lines.
It ends with the word cat.

In [ ]:

Exercise: write the code that counts the number of words in the string text.

Here are some examples:

The number of words in "Hello world" is 2.
The number of words in "Whether it is nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them?" is 30.
The number of words in len.__doc__ is 8.

Hint: the string method .split() might come in handy.

In [52]:

text = "Hello world"

# write here the code that counts the number of words in `text`
# ...

In [53]:

#@titlesolution
text = "Hello world"
# 
# text = len.__doc__
# 
# text = """Whether it is nobler in the mind to suffer
# the slings and arrows of outrageous fortune,
# or to take arms against a sea of troubles,
# and by opposing end them?"""

words = text.split()
wordcount = 0
for word in words:
    wordcount = wordcount + 1
wordcount

Out[53]:

In [ ]:

Exercise write the Python code that opens the file story.txt, reads the contexts of the file to a string text, then computes their word count.

Hint: reuse the code we saw earlier for opening file

Hint 2: try the .read() method on file object

Hint 3: reuse the code you wrote earlier for doing the word count of the string text

In [54]:

file = open("story.txt")
text = file.read()
# ... (continue your code here)

In [55]:

#@titlesolution
file = open("story.txt")
text = file.read()
words = text.split()
wordcount = 0
for word in words:
    wordcount = wordcount + 1
wordcount

Out[55]:

In [ ]:

Example: bag of words representation¶

Given the string text, we want to count the number of occurrences of each word in the text. The words "HELLO", "Hello", "Hello," should all be considered the same as "hello".

In other words, we want to convert all letters to lowercase and strip punctuation signs.

Given the string "HELLO Hello Hello, hello", the bag of words representation corresponds to the dictionary

wcounts = {"hello":4}

In [56]:

text = "Hello world"
# 
# text = len.__doc__
# 
# text = """Whether it is nobler in the mind to suffer
# the slings and arrows of outrageous fortune,
# or to take arms against a sea of troubles,
# and by opposing end them?"""

words = text.split()
clean_words = [word.strip(",.?") for word in words]
words_lower = [word.lower() for word in clean_words]

wcounts = {}
for word in words_lower:
    if word not in wcounts:
        wcounts[word] = 0
    wcounts[word] += 1

wcounts

Out[56]:

{'hello': 1, 'world': 1}

In [ ]:

Exercise Write the Python code that computes the final score for each student, based on the data in this spreadsheet. The final score is computed as:

50% homework assignments (each homework counts for 10%)
20% midterm
30% final

Once you get the score for each student, you should also convert it to a letter grade. Print the student name, their final score, and their letter grade.

Hint: use a for loop to iterate over all students

Hint 2: reuse the code you wrote earlier for converting numeric score to letter grade

In [57]:

# via https://stackoverflow.com/a/33727897/127114
# 
url_tpl = "https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={name}"
sheet_id = "1_DRn3FXpLERVhO71pHsYbf_jwxQF8p54M6Niy3If3x0"
sheet_name = "Grades"
url = url_tpl.format(key=sheet_id, name=sheet_name)

import requests
response = requests.get(url)

# response.text

In [58]:

# Convert CSV text (contents of a file) to a list of dictionaries
import csv, io
studentsf = io.StringIO(response.text)
rows = list(csv.DictReader(studentsf))
# rows

In [59]:

for row in rows:
    # row is a dict containing a student's results
    # the keys are ['id', 'name', 'hw1', 'hw2', 'hw3', 'midterm',
    #               'hw4', 'hw5', 'final', 'score', 'grade']
    # the values are strings
    name = row["name"]     # access value under the key "name" in this row
    hw1 = int(row["hw1"])  # access key "hw1" and convert it to `int`
    print(type(row), len(row), "name =", name, "got", hw1, "on the first homework")
    
    # continue your code at the ... below
    # PART 1: computing the final scrore
    # PART 2: assigning a letter
    # ...

<class 'dict'> 11 name = Haydon Jeffery got 12 on the first homework
<class 'dict'> 11 name = Julie Beattie got 13 on the first homework
<class 'dict'> 11 name = Malachy Hull got 15 on the first homework
<class 'dict'> 11 name = Sheila Solis got 14 on the first homework
<class 'dict'> 11 name = Joni Rowe got 12 on the first homework
<class 'dict'> 11 name = Husna Millar got 11 on the first homework
<class 'dict'> 11 name = Tonya Fleming got 11 on the first homework
<class 'dict'> 11 name = Jak Rennie got 19 on the first homework
<class 'dict'> 11 name = Noor Odonnell got 14 on the first homework
<class 'dict'> 11 name = Krystal Dickerson got 13 on the first homework
<class 'dict'> 11 name = Joe Pickett got 3 on the first homework
<class 'dict'> 11 name = Alicia Rosario got 10 on the first homework
<class 'dict'> 11 name = Ailish Hensley got 13 on the first homework
<class 'dict'> 11 name = Aliyah Duncan got 12 on the first homework
<class 'dict'> 11 name = Jad Kumar got 15 on the first homework
<class 'dict'> 11 name = Margaret Parry got 14 on the first homework
<class 'dict'> 11 name = Danica Chen got 11 on the first homework
<class 'dict'> 11 name = Jose Hernandez got 13 on the first homework
<class 'dict'> 11 name = Rimsha Carlson got 20 on the first homework
<class 'dict'> 11 name = Giselle Thompson got 18 on the first homework

In [60]:

#@titlesolution
import csv
import io

studentsf = io.StringIO(response.text)
rows = list(csv.DictReader(studentsf))
for row in rows:
    # row is a dict containing a student's results
    # the keys are ['id', 'name', 'hw1', 'hw2', 'hw3', 'midterm',
    #               'hw4', 'hw5', 'final', 'score', 'grade']
    # the values are strings
    name = row["name"]     # access value under the key "name" in this row
    print("Processing results of", name, "...")
    
    
    # PART 1: computing the final scrore
    ####################################################################################
    # Convert all available student results to integers
    hw1 = int(row["hw1"])            # out of 20
    hw2 = int(row["hw2"])            # out of 20
    hw3 = int(row["hw3"])            # out of 20
    midterm = int(row["midterm"])    # out of 100
    hw4 = int(row["hw4"])            # out of 20
    hw5 = int(row["hw5"])            # out of 20
    final = int(row["final"])        # out of 100

    # compute homeworks average out of 100,
    # which is simple because each homework is out of 20 and there are 5 of them
    homeworks = hw1 + hw2 + hw3 + hw4 + hw5
    
    # we now need to make a "mix" of homeworks, midterm, and final
    # to create the student's final score (out of 100)
    score = 0.5*homeworks + 0.2*midterm + 0.3*final
    print("   - final score = ", score)
    
    
    # PART 2: assigning a letter 
    ####################################################################################
    if score >= 85:
        grade = "A"
    elif score >= 80:
        grade = "A-"
    elif score >= 75:
        grade = "B+"
    elif score >= 70:
        grade = "B"
    elif score >= 65:
        grade = "B-"
    elif score >= 60:
        grade = "C+"
    elif score >= 55:
        grade = "C"
    elif score >= 50:
        grade = "D"
    else:
        grade = "F"
    print("   - final grade = ", grade)
    
    
    # PART 3: (optional) save the results in the `row` dictionary
    ####################################################################################
    row["score"] = score
    row["grade"] = grade

Processing results of Haydon Jeffery ...
   - final score =  79.39999999999999
   - final grade =  B+
Processing results of Julie Beattie ...
   - final score =  69.19999999999999
   - final grade =  B-
Processing results of Malachy Hull ...
   - final score =  78.8
   - final grade =  B+
Processing results of Sheila Solis ...
   - final score =  77.8
   - final grade =  B+
Processing results of Joni Rowe ...
   - final score =  81.6
   - final grade =  A-
Processing results of Husna Millar ...
   - final score =  77.89999999999999
   - final grade =  B+
Processing results of Tonya Fleming ...
   - final score =  74.69999999999999
   - final grade =  B
Processing results of Jak Rennie ...
   - final score =  86.2
   - final grade =  A
Processing results of Noor Odonnell ...
   - final score =  79.5
   - final grade =  B+
Processing results of Krystal Dickerson ...
   - final score =  78.2
   - final grade =  B+
Processing results of Joe Pickett ...
   - final score =  46.0
   - final grade =  F
Processing results of Alicia Rosario ...
   - final score =  77.3
   - final grade =  B+
Processing results of Ailish Hensley ...
   - final score =  76.8
   - final grade =  B+
Processing results of Aliyah Duncan ...
   - final score =  77.8
   - final grade =  B+
Processing results of Jad Kumar ...
   - final score =  81.5
   - final grade =  A-
Processing results of Margaret Parry ...
   - final score =  76.5
   - final grade =  B+
Processing results of Danica Chen ...
   - final score =  79.4
   - final grade =  B+
Processing results of Jose Hernandez ...
   - final score =  76.19999999999999
   - final grade =  B+
Processing results of Rimsha Carlson ...
   - final score =  90.0
   - final grade =  A
Processing results of Giselle Thompson ...
   - final score =  89.4
   - final grade =  A

In [61]:

# # Display last row
# rows[-1]

Lists of booleans¶

Lists of booleans can be "joined" together using and or or operations, but calling all and any list-related built-in functions.

all(conditions): and-together all elements in list of conditions
any(conditions): or-together all elements in list of conditions

In [62]:

# list of three conditions, all being True
alltrue = [True, True, True]

# list of conditions where only one condition is True
onetrue = [True, False, False]

# list of conditions that are all False
allfalse = [False, False, False]

In [63]:

all(alltrue), all(onetrue), all(allfalse)

Out[63]:

(True, False, False)

In [64]:

any(alltrue), any(onetrue), any(allfalse)

Out[64]:

(True, True, False)

In [ ]:

Example function `head`¶

We often want to print first few lines from a file to see what data it contains.

In [65]:

# TODO

In [ ]:

Function tricks¶

Lambda functions¶

Python supports and alternative syntax for defining functions, that is sometimes used to define simple functions. For example, let's say you need a function that computes the square of the input, which is written as $f(x) = x^2$ in math notation.

You can use the standard Python syntax ...

def f(x):
    return x**2
f

... or you can use the lambda expression for defining that function:

lambda x: x**2

The general syntax is lambda <function inputs>: <function output value>.

This lambda-shortcut for defining functions is useful when calling other functions that expect functions as their arguments. To illustrate what is going on, let's define a python function plot_function(f) that plots the graph of the function f it receives as its input.

The graph of the function $f(x)$ is the set of points $(x,f(x))$ in the Cartesian plane, over the interval of $x$ inputs (we'll use the $x$-limits -10 as the starting point and until x=10 as the end point).

In [66]:

import numpy as np
import matplotlib.pyplot as plt

def plot_function(f, xlims=[-10,10]):
    xstart, xend = xlims
    xs = np.linspace(xstart, xend, 1000)
    ys = [f(x) for x in xs]
    plt.plot(xs,ys)    

If we want to use the function plot_function to plot the graph of the function $f(x)=x^2$, we can define a Python function f using the standard def-syntax and then pass the function f to plot_function to generate the graph, as shown below:

In [67]:

def f(x):
    return x**2

plot_function(f)

Alternatively, we can use the lambda-shortcut Python syntax to define the function inline, when calling plot_function.

In [68]:

plot_function(lambda x: x**2)

The lambda-expression lambda x: x**2 is equivalent to the Python function f we defined using the two-line def-statement.

Both ways of defining f are the same type of object:

In [69]:

type(f)

Out[69]:

function

In [70]:

type(lambda x: x**2)

Out[70]:

function

In [ ]:

Since f and lambda x: x**2 are both expressions of the type function, we can call both of them the same way (by passing in the argument in parentheses).

For example, if we want to evaluate the function $f(x)$ at the input $x=3$, we can call f(3) ...

In [71]:

f(3)

Out[71]:

or we could define the function $f(x)=x^2$ using an inline lambda expression, then call the result of the lambda expression by passing in the argument in parentheses.

In [72]:

(lambda x: x**2)(3)

Out[72]:

The lambda-shortcut for defining functions is not used often, but sometimes it is very convenient to be able to use inline function definition, so I want you to be familiar with this syntax.

Applying functions to lists (optional)¶

Consider the math function $f(x)=x^2$. We'll identify the output of the function as the variable $y$.

In Python, the function $f(x)=x^2$ is

In [73]:

def f(x):
    return x**2

Suppose we want to compute the output values $y=f(x)$ for each $x$ in the list of values $[1,2,3,4]$, which we'll call xs in the code.

In [74]:

xs = [1,2,3,4]

Option A You can use a for loop to compute the function output $y=f(x)$ for every $x$ in the list of values. First we create an empty list ys to store the outputs, then we .append to it the values one-by-one as we go though the for loop.

In [75]:

ys = []
for x in xs:
    y = f(x)
    ys.append(y)

ys

Out[75]:

[1, 4, 9, 16]

Option B We can shorten the code using the list comprehension syntax:

In [76]:

ys = [f(x) for x in xs]

ys

Out[76]:

[1, 4, 9, 16]

Option C A third alternative would be to use the function map(f,xs) which returns a list of the outputs f(x) for all x in the list xs.

In [77]:

ys = map(f, xs)

list(ys)

Out[77]:

[1, 4, 9, 16]

In [78]:

# ALT. we can specify the function argument to map as a lambda expression
# list(map(lambda x: x**2, xs))

This notion of obtaining ys from xs for entire lists of values, instead of individual values like x and y is super useful. We'll see this idea come up again later on in this tutorial when we discuss the Python module NumPy, which allows you to do math operations with "universal functions" that do the right thing whether you input a number x, a list of numbers xs, or even more complicated data structures (e.g. two-dimensional matrices, or higher-dimensional tensors).

In [ ]:

Running Python code interactively¶

Notebooks are an example of "interactive" use of the Python interpreter. You enter some commands 2+3 in a code cell, press SHIFT+ENTER to run the code, and you see the result.

There are several different ways you can access the Python interpreter.

python shell. This is what you get in you install Python on your computer. You can open a command prompt (terminal or cmd.exe) and type in the command python to start the interactive Python shell.

> python
Python 3.6.9 (default, Oct  6 2019, 21:40:49)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 2+3
5
>>>

ipython shell. This is a fancier shell with line numbering and many helpful commands.

> ipython
Python 3.6.9 (default, Oct  6 2019, 21:40:49)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 2+3
Out[1]: 5

In [2]:

Jupyter notebooks are web-based coding environments that allow you to mix code cells and Markdown cells to create "code documents." Notebook files have an extension .ipynb and can be created using JupyterLab. Several other systems like nbviewer, GitHub, VSCode, Google Colab, can also be used "open" notebooks for viewing and "run" the notebooks interactively.
Colab notebooks. Google operates a service called "Google Colaboratory" (Colab for short) that allows you to run Python code as Colab notebooks.

Note the "Python calculator" functionality works the same way in each case. The basic Python shell, the fancy ipython shell, and the notebook interface all offer a place to input your commands, they READ your command input, EVALUATE them (i.e. run them), PRINT the output of the commands execution. At the end of the READ-EVAL-PRINT steps, the Python in interpreter goes back into "listening mode" waiting for your next command input.

The overall behaviour of the Python interpreter is an example of the READ-EVAL-PRINT Loop (REPL) that appears in professional human-computer interfaces. The command line prompt (terminal on UNIX or cmd.exe on Windows), database prompts, the JavaScript console in your browser, the Ruby interactive console irb, and any other interface which accepts commands.

Given this multitude of choices, we've opted to use a Jypyter notebook to present this tutorial. Keep in mind you could run all the code examples in python shell, or ipython shell, or as a Colab notebook.

While we're on the topic of running Python code, let's briefly mention the other ways Python applications can operate. This is completely out of scope for the remainder of the discussion in this tutorial, since we're just using Python as a fancy calculator, but I though I'd mention some of the other uses of Python codes.

In [79]:

import io
import pandas as pd
data2 = io.StringIO("""
student_ID,background,curriculum,effort,score
1,arts,,10.96,75
2,science,lecture,8.69,75
""")
df2 = pd.read_csv(data2)
df2

Out[79]:

	student_ID	background	curriculum	effort	score
0	1	arts	NaN	10.96	75
1	2	science	lecture	8.69	75

In [ ]:

Bonus topics¶

writing standalone scripts (argparse)
Reading and writing files https://python-textbok.readthedocs.io/en/1.0/Python_Basics.html#files

Bonus topics¶

Writing standalone scripts¶

For loop tricks¶

Tricks:

enumerate: provides an index when iterating through a list.
zip: allows you to iterate over multiple lists in parallel.

Using `enumerate` to get `for`-loop with index¶

Use enumerate(somelist) to iterate over tuples (index, value), from a list of values from the list somelist. In each iteration, the index tells you the index of the value in the current iteration.

In [80]:

scores = [61, 79, 98, 72]
list(enumerate(scores))

Out[80]:

[(0, 61), (1, 79), (2, 98), (3, 72)]

In [81]:

# example usage
for idx, score in enumerate(scores):
    # this for loop has two variables index and score
    print("Processing score", score, "which is at index", idx, "in the list")

Processing score 61 which is at index 0 in the list
Processing score 79 which is at index 1 in the list
Processing score 98 which is at index 2 in the list
Processing score 72 which is at index 3 in the list

In [ ]:

Using `zip`¶

Use zip(list1,list2) to get an iterator over tuples (value1, value2), where value1 and value2 are elements taken from list1 and list2, in parallel, one at a time.

The name "zip" is reference to the way a zipper joins together the teeth of the two sides of the zipper when it is closing.

In [82]:

# example 1
list( zip([1,2,3], ["a","b","c"]) )

Out[82]:

[(1, 'a'), (2, 'b'), (3, 'c')]

In [83]:

# example 2
list1 = [1, 2, 3]
list2 = [4, 5, 6]

list(zip(list1, list2))

Out[83]:

[(1, 4), (2, 5), (3, 6)]

In [84]:

# compute the sum of the matching values in two lists
for value1, value2 in zip(list1, list2):
    print("The sum of", value1, "and", value2, "is", value1+value2)

The sum of 1 and 4 is 5
The sum of 2 and 5 is 7
The sum of 3 and 6 is 9

In [ ]:

Functional programming helpers¶

functools.partial for currying functions (e.g sample-generator callables)

In [ ]:

List-like objects = iterables¶

The term "iterable" is used in Python to refer to any list-like object that can be used in a for-loop.

Examples of iterables:

strings
dictionary keys, dictionary values, or dictionary (key,value) items
sets
range (lazy generator for lists of integers)

In [85]:

range(0, 4)

Out[85]:

range(0, 4)

In [86]:

list(range(0, 4))

Out[86]:

[0, 1, 2, 3]

In [ ]:

Iterating over dictionaries¶

In [87]:

profile = {"first_name":"Julie", "last_name":"Tremblay", "score":98}

In [88]:

list(profile.keys())

Out[88]:

['first_name', 'last_name', 'score']

In [89]:

# ALT.
list(profile)

Out[89]:

['first_name', 'last_name', 'score']

In [90]:

list(profile.values())

Out[90]:

['Julie', 'Tremblay', 98]

In [91]:

list(profile.items())

Out[91]:

[('first_name', 'Julie'), ('last_name', 'Tremblay'), ('score', 98)]

We'll talk more about dictionaries later on.

In [ ]:

Converting iterables to lists¶

Under the hood, Python uses all kinds of list-like data structures called iterables". We don't need to talk about these or understand how they work—all you need to know is they are behave like lists.

In the code examples above, we converted several fancy list-like data structures into ordinary lists, by wrapping them in a call to the function list, in order to display the results.

Let's look at why need to use list(iterable) when printing, instead of just iterable.

For examples, the set of keys for a dictionary is a dict_keys iterable object:

In [92]:

profile.keys()

Out[92]:

dict_keys(['first_name', 'last_name', 'score'])

In [93]:

type(profile.keys())

Out[93]:

dict_keys

I know, right? What the hell is dict_keys? I certainly don't want to have to explain that...

... so instead, you'll see this in the code:

In [94]:

list(profile.keys())

Out[94]:

['first_name', 'last_name', 'score']

In [95]:

type(list(profile.keys()))

Out[95]:

list

In [ ]:

Generic function arguments¶

functions with *args and **kwargs arguments

In [ ]:

Python programming¶

Coding a.k.a. programming, software engineering, or software development is a broad topic, which is out of scope for this short tutorial. If you're interested to learn more about coding, see the article What is code? by Paul Ford. Think mobile apps, web apps, APIs, algorithms, CPUs, GPUs, SysOps, etc. There is a lot to learn about applications enabled by learning basic coding skills, it's almost like reading and writing skills.

Learning programming usually takes several years, but you don't need to become a professional coder to start using Python for simple tasks, the same way you don't need to become a professional author to use writing for everyday tasks. If you reached this far in the tutorial, you know enough about basic Python to continue your journey.

In particular, you can read the other two tutorials that appear in the No Bullshit Guide to Statistics:

Pandas (see pandas_tutorial.ipynb)
Seaborn (see seaborn_tutorial.ipynb)

In [ ]:

TODO functions are verbs

The In Python, we generally prefer to use more descriptive names (whole words) for function names and their inputs, as illustrated in the next example.

In [ ]:

Exercise 13: Replace the ...s in the following code cell with comments that explain the calculation "adding 10% tax to a purchase that costs $57" that is being computed.

In [96]:

cost = 57.00           # ...
taxes = 0.10 * cost    # ...
total = cost + taxes   # ...
total                  # ...

Out[96]:

62.7

In [97]:

#@titlesolution Exercise 13 cost-plus-taxes-total
cost = 57.00           # price before taxes
taxes = 0.10 * cost    # 10% taxes = 0.1 times the cost
total = cost + taxes   # add taxes to cost and store the result in total
total                  # print the total

Out[97]:

62.7

Objects and classes¶

All the Python variables we've been using until now are different kinds of "objects." An object is a the most general purpose "container" for data, that also provides functions for manipulating this data in the object.

In particular:

attributes: data properties of the object
methods: functions attached to the object

Examples¶

Example 1: string objects¶

In [98]:

msg = "Hello world!"

type(msg)

Out[98]:

str

In [99]:

# Uncomment the next line and press TAB after the dot
# msg.

In [100]:

# Methods:
msg.upper()
msg.lower()
msg.__len__()
msg.isascii()
msg.startswith("He")
msg.endswith("!")

Out[100]:

True

Example 2: file objects¶

In [101]:

filename = "message.txt"
file = open(filename, "w")

type(file)

Out[101]:

_io.TextIOWrapper

In [102]:

# Uncomment the next line and press TAB after the dot
# file.

In [103]:

# Attributes:
file.name
file.mode
file.encoding

Out[103]:

'UTF-8'

In [104]:

# Methods:
file.write("Hello world\n")
file.writelines(["line2", "and line3."])
file.flush()
file.close()

DIscussion¶

Let's go over some of the things we skipped in the tutorial, because they were not essential for getting started. Now that you know a little bit about Python, it's worth mentioning some of these details, since it's useful context to see how this "Python calculator" business works.

In [ ]:

Cut material from Python tutorial¶

Python for data science and statistics¶

Review of math functions¶

Python functions¶

Overview of the material in this tutorial¶

Exceptions¶

Example: Learning about the abs function¶

DICTS¶

Inline if statements (bonus topic)¶

LISTS¶

See what's in the global namespace (bonus)¶

Application: changing the separator when printing multiple values¶

Question 2: Sample standard deviation¶

Sets¶

Tuples¶

Defining new types of objects¶

Note for Windows users¶

Converting lists to strings¶

Example: file open and the readlines() method¶

Example: bag of words representation¶

Lists of booleans¶

Example function head¶

Function tricks¶

Lambda functions¶

Applying functions to lists (optional)¶

Running Python code interactively¶

Bonus topics¶

Bonus topics¶

Writing standalone scripts¶

For loop tricks¶

Using enumerate to get for-loop with index¶

Using zip¶

Functional programming helpers¶

List-like objects = iterables¶

Iterating over dictionaries¶

Converting iterables to lists¶

Generic function arguments¶

Python programming¶

Objects and classes¶

Examples¶

Example 1: string objects¶

Example 2: file objects¶

DIscussion¶

Example: Learning about the `abs` function¶

Example: file `open` and the `readlines()` method¶

Example function `head`¶

Using `enumerate` to get `for`-loop with index¶

Using `zip`¶