This Juppyter notebook contains the code examples form the blog post Python coding skills for statistics Part 1.
I've intentionally left empty code cells throughout the notebook, which you can use to try some Python commands on your own. For example, you can copy-paste some of the commands in previous cells, modify them and run to see what happens. Try to break things, that's the best way to learn!
To run a code cell, press the play button in the menu bar, or use the keyboard shortcut SHIFT+ENTER.
2.1 + 3.4
5.5
num1 = 2.1
num2 = 3.4
num1 + num2
5.5
Let's now compute the avarage of the numbers num1
and num2
.
(num1 + num2) / 2
2.75
grades = [80, 90, 70, 60]
avg = sum(grades) / len(grades)
avg
75.0
total = 0
for grade in grades:
total = total + grade
avg = total / len(grades)
avg
75.0
Python functions are ...
To define the Python function,
we use the def
keyword followed by the function name,
then we then specify the function input in parentheses,
and end with the symbol :
,
which tells us "body" of the function is about to start.
The function body is a four-spaces-indented code block that specifies all the
calculations the function performs,
and ends with a return
statement for the output of the function.
def <fname>(<input>):
<fcalc 1>
<fcalc 2>
<fcalc ...>
return <output>
We want to define a Python function mean
that computes the mean from a given sample (a list of values).
The mathematical definition of the mean is $\mathbf{Mean}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^{i=n} x_i$, where $\mathbf{x} = [x_1, x_2, x_3, \ldots, x_n]$ is a sample of size $n$ (a list of values).
The code for the function is as follows:
def mean(values):
total = 0
for value in values:
total = total + value
avg = total / len(values)
return avg
To call the function mean
with input grades
, we use the Python code mean(grades)
.
grades = [80, 90, 70, 60]
mean(grades)
75.0
In math, a function is a mapping from input values (usually denoted x) to output values (usually denoted y). Consider the mapping that doubles the input and adds five to it, which we can express as the math function $f(x) = 2x+5$. For any input $x$, the output of the function $f$ is denoted $f(x)$ and is equal to $2x+5$. For example, $f(3)$ describes the output of the function when the input is $x=3$, and it is equal to $2(3)+5 = 6 + 5 = 11$. The Python equivalent of the math function $f(x) = 2x+5$ is shown below.
def f(x):
y = 2*x + 5
return y
To call the function f
with input x
, we simply writhe f(x)
in Python,
which is the same as the math notation we use for "evaluate the function at the value x."
f(3)
11
prices = [11.8, 10, 11, 8.6, 8.3, 9.4, 8, 6.8, 8.5]
import seaborn as sns
sns.stripplot(x=prices, jitter=0)
<Axes: >
sns.histplot(x=prices)
<Axes: ylabel='Count'>
sns.boxplot(x=prices)
<Axes: >
import pandas as pd
epriceswide = pd.read_csv("https://nobsstats.com/datasets/epriceswide.csv")
print(epriceswide)
East West 0 7.7 11.8 1 5.9 10.0 2 7.0 11.0 3 4.8 8.6 4 6.3 8.3 5 6.3 9.4 6 5.5 8.0 7 5.4 6.8 8 6.5 8.5
type(epriceswide)
pandas.core.frame.DataFrame
We want to extract only the second column which is called "West":
pricesW = epriceswide["West"]
pricesW
0 11.8 1 10.0 2 11.0 3 8.6 4 8.3 5 9.4 6 8.0 7 6.8 8 8.5 Name: West, dtype: float64
type(pricesW)
pandas.core.series.Series
# # ALT. we can input data by specifying lists of values
# pricesW = pd.Series([11.8,10,11,8.6,8.3,9.4,8,6.8,8.5])
pricesW.count()
9
pricesW.mean()
9.155555555555557
pricesW.median()
8.6
pricesW.std()
1.5621388471508475
pricesW.describe()
count 9.000000 mean 9.155556 std 1.562139 min 6.800000 25% 8.300000 50% 8.600000 75% 10.000000 max 11.800000 Name: West, dtype: float64
I remind you the key aspect is to learn how to use Python as a calculator.
I talked about the for
-loops and function definitions only to make sure you can read Python code,
but you don't need to write any such code to learn statistics.
As long as you know how to call functions and run code cells in a notebook,
then you'll still benefit from all the educational power that Python has to offer.
Python = good for your life!
3/0
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) Cell In[26], line 1 ----> 1 3/0 ZeroDivisionError: division by zero