This is the first half of Lab 1 for * ECE 314 Probability in Engineering Lab. * We post it in case you would like to learn a bit about Python in advance of taking the course. At this point in your academic careers you should have some knowledge of object oriented computer programming. It would certainly help if you've had experience with Python, but if not, have no fear. Python is a very intuitive programming language. If you've coded in C#, JAVA, or Matlab you should have no trouble learning Python. Before we get too far into the code, we present a few general notions of what the environment will look like.
The computer you are using to read this file probably has installed on it the Jupyter Notebook App or similar application to read IPython version 4 notebooks. We also assume the notebooks are run using Python version 2.7XX rather than version 3.4XX. For more information on installation or using an engineering work station (EWS) Linux machine, see instructions on the course webpage. An IPython Notebook file (with extension .ipynb) is an accumulation of cells, each composed of either code or markdown (i.e., text). Each code cell is individually executable. Each markdown cell can contain (among many things) LaTex and HTML. Throughout each lab you will be shown examples of code, probability theory, and coding applications. *You will need to be able modify this file to include your own answers and edits. Each of the questions is numbered in bold and we ask that you put all your responses/code in cells just after the stated questions. Let's go over some of the basics:
Python is an object oriented programming language where the user has access to functions through imported packages. A package is a collection of modules in directories that have a hierarchy. The three most common packages that we will use in this course are numpy, scipy, and matplotlib, though we will pick up others along the way. Before you can use any of these, you must import them. You only need to import them once in an IPython Notebook file, and then any cell in the notebook can have access to them. Running the code below imports all the pakages you will need for this lab. The simple print statement lets you know when it's completed.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import scipy.stats as st
print ("Modules Imported!")
Modules Imported!
The first line is slightly different than the others and uses what is known as a "magic" function. This particular "magic" function simply makes it so that the plots we generate with the matplotlib package occur inline as opposed to opening in new windows outside of the notebook.
Python is very similar to Matlab and can be used to solve numerical problems. We simply need to run an expression and it will output an answer.
3+4*2
We can also create a variable, set it equal to an expression, and print the value.
x = 3+4**2
print (x)
We used ** to represent an exponent. Similarly, we can take the square root of a number this way. Here is an attempt:
3+4**(1/2)
You should get the answer 5 if running Python 3.x. Under Python 2.7, the division of integers 1/2 would return 0 and the final output would be 4. That could be corrected by changing 1/2 to 1./2.
Python handles lists very similarly to Matlab. We can set variables equal to lists and perform operations on them. We can change the contents of the list and they don't need to be of the same type. This is called being mutable. Note that Python indexes starting with 0, as shown below.
x = [1,2,3,4,5]
y = [6,7,8,9,10]
print (x, y)
x[0] = 'Dog'
print (x[0])
Python also has what is known as a tuple. A tuple is very similar to a list, but is immutable. We cannot change the contents of the tuple. Tuples are often used to input or return objects. Below is the same code as above, but with tuples. It gives us an error message when we try to set x[0].
x = (1,2,3,4,5)
y = (6,7,8,9,10)
print (x, y)
x[0] = 'Dog'
print (x[0])
Below is a list of tuples. It has two tuples and each tuple has five elements.
x = [(1,2,3,4,5),(6,7,8,9,10)]
print (x)
print (x[0][3])
You may like to think of lists and tuples as arrays in some sense, but try to keep them separate. An array is actually an object from the NumPy module. We'll go over them a little bit further in the lab, but there are some notable differences.
If statements in Python are like those of most other languages. You need to use a keyword (if or else), followed by a condition, and finally a colon (:). Keep in mind instead of using brackets for grouping, Python goes by indentation. In the if statement below all parts of the if statement are contained within that indentation.
x = 3
y = 1
if x>y:
print ("I")
if x>3:
print ("Hate")
else:
print ("Love")
print ("Probability")
print ("!")
I Love Probability !
For loops use the keyword "for" followed by a variable and the keyword "in" and a certain range or vector. The same rules for indentation apply here. Recall that indexing starts at 0. The range(n) function simply creates a integer list from 0 to n-1 in whole number increments.
x = [0,0,0,0,0]
for i in range(5):
c = 2*i**2
x[i]=c
print (x)
[0, 2, 8, 18, 32]
Similarly, you can use while loops. In the code below, we make use of the .append method of a list to keep adding to our list without needing to know the size initially. (By the way, a "method" is a function associated with an object. In this case, append is a method associated with a list.)
x = [0]
i = 0
while x[i]<12:
i = i+1
x.append(i)
print (x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
To specify a function, we need to use the "def" keyword. You need to give the number of inputs and have a return line to end your function. Below is a function that returns the factorial of the input.
def factorial(x):
c = 1
for i in range(x,1,-1): #range(x,1,-1) creates a vector from x to 2 in -1 increments
c = c*i
return c
print (factorial(5))
You can also return multiple outputs. Technically, we are still returning a single object, but it is a tuple. We can unpack the tuple when we call the function. Below is a function that returns the first and last digit of any integer.
def firstNlast(x):
l = x%10 # Uses the modulus operator %
while x>0:
f = x%10
x = int(x/10)
return f,l
x = 70094921348
first, last = firstNlast(x)
print (first, last)
7 8
The returned items get returned as a tuple and you can individually retrieve them by setting them equal to another tuple.
One of the reasons Python is so popular is due to the building capability of the modules. Remember those files we imported initially? We have access to all of the methods they contain. We abbreviated them to shorthand signifiers so we can code more quickly. It would be impossible to give you an overview of all the useful methods because there are so many. But they are fairly intuitive, so if you think something should be a method, it's probably included. Let's start with NumPy and create an array.
x = np.array([1,2,3,4,5])
print (x)
print (x[3])
[1 2 3 4 5] 4
In order to access the "array" method we just needed to type our signifier "np" and then put a decimal and the method. If you want a list of methods to come up as you're coding, after typing the decimal, hit tab on your keyboard. We can similarly declare multidemensional arrays, but notice the use of brackets and indexing. Unlike lists, arrays can only contain a single type. Indexing is also done a little more intuitively (like Matlab) than that of lists. Arrays are also mutable and can be used in multiple dimensions (to create matrices for instance).
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
print (x)
print (x[0,0])
print (x[:,1])
print (x[1,:])
[[1 2 3] [4 5 6] [7 8 9]] 1 [2 5 8] [4 5 6]
To give you a better idea of how to use these modules, here are a number of coding examples with functions that will be particularly useful to you this semester. Below we create a function and then plot it over time. Of course we need to properly title and label the graph.
def f(t): #Creates the function that we are going to plot
return t**3-t**2+t-1
t = np.linspace(-10,10,1000) #Creates an array from -10 to 10 with 1000 points in it
plt.plot(t,f(t)) #Generates a plot of these two vectors.
plt.title('Function vs. Time')
plt.xlabel('Time(s)')
plt.ylabel('Function Value')
Text(0, 0.5, 'Function Value')
The following code is going to create a large vector of random numbers using NumPy's random function. Then it's going to plot them. It's taking the random numbers from an exponential distribution and a normal (Gaussian) distribution. These are both continuous random variables which you will learn about later in the course.
x = np.random.exponential(1,size = 100) #Generates a vector of 100 points from the exponential distribution
y = np.random.normal(size = 100) #Generates a vector of 100 points from the Normal distribution
plt.plot(x,'ro', label='exponential') #Plots x in red circles with the label exponential
plt.plot(y,'go', label = 'normal')
plt.title('Random values.')
plt.xlabel('index')
plt.ylabel('value')
plt.legend()
<matplotlib.legend.Legend at 0x1a146e4588>
This code creates two matrices, multiplies one times the transpose of the other and then finds the eigenvalues:
A = np.array([(3,7,9),(4,5,1),(12,6,3)]) #Creates Matrix A
B = np.array([(1,0,3),(2,4,0),(8,3,1)]) #Creates Matrix B
A_transpose = A.T #Takes the transpose of A
C = A_transpose.dot(B) #Takes the matrix multiplication of A_transpose and B. Note using * performs a different operation on 2-d arrays
# * is the usual matrix multiplication when applied to np.matrix objects
print (np.linalg.eigvals(C)) #Uses the eigvals method under linalg under NumPy to print the eigenvalues
[149.57404656 8.88119895 16.54475449]
These are just the basics to be able to program in Python. I have highlighted some of what I feel are the most important functions or modules to know. For a more complete tutorial, take a look at https://docs.python.org/2.7/tutorial/index.html
The scipy stats package contains a number of functions for using and analyzing distributions. Two of its classes are rv_discrete and rv_continous, for discrete type and for continuous type distributions, respectively. A discrete probability distribution is specified by a set of possible values, $c_1,c_2, \ldots $ and associated probabilities for the values, $p_1, p_2, \ldots $ which sum to one. The probability mass function $p$ is defined by $p(c_i)=p_i$ for all $i,$ and $p(c)=0$ for values $c$ not in the list of possible values. A random variable $X$ has such a discrete distribution if $P\{X = u\} = p(u)$ for all $u.$
There are several important families of discrete probability distributions that frequently arise in applications. A very basic example is the Bernoulli distribution with parameter $p,$ where $0\leq p \leq 1.$ The distribution assigns probability $p$ to value 1, and probability $1-p$ to value 0. If a random variable $X$ has the Bernoulli distribution with parameter $p$, we call $X$ a Bernoulli random variable with parameter $p,$ and we write $X \sim Bernoulli(p).$ For example, if $X \sim Bernoulli(\frac{1}{4}),$ then $P\{X = 1\}=\frac{1}{4}$ and $P\{X = 0\}=1-\frac{1}{4} = \frac{3}{4}$. There is zero probability that $X$ is any value other than $1$ or $0$. The class rv_discrete within the scipy stats package is for working with general discrete type random variables, with many instances of the class corresponding to particular well known probability distribuions. It gives a convenient way to compute the mean, variance, pmf, and other attributes for a given distribution, and for generating random variates, using random number generators, with the given distribution.
For example, one instance of the rv_discrete class is the object for the bernoulli distribution. By specifying (aka freezing) a value for the parameter $p$ we create a more specialized instance of a rv_discrete class. The cumulative distribution function (CDF) of a random variable $X$ is the function $F_X$ defined by $F_X(c)=P\{X\leq c\}$ for any real value of $c.$ The CDF for the $Bernoulli(\frac{1}{4})$ distribution has a jump of size 3/4 at zero and a jump of size 1/4 at one.
p = 1./4 #Sets the probability, uses decimal to create double (not integer)
bernoulli25 = st.bernoulli(p) #Generates object for Bernoulli(0.25) distribution
x = np.linspace(-2,2,1001) #Generates a vector on [-2,2] with 1001 points in it
print ('Mean:', bernoulli25.mean()) #Prints the mean (aka expected value) for the distribution
print ('Var:', bernoulli25.var()) #Prints the variance of X
plt.plot(x,bernoulli25.cdf(x)) #Creates a graph of the cumulative distribution fucntion (CDF) of X
plt.title('CDF of Bernoulli(0.25) distribution')
plt.axis([-2, 2, 0, 1.05]) # Sets the displayed ranges of x-axis and y-axis to be [-2, 2] and [0, 1.05]
Mean: 0.25 Var: 0.1875
[-2, 2, 0, 1.05]
Above, we were able to recreate our Bernoulli distribution through scipy.stats.
Problem 1: Using the scipy.stats package do the following:
########Student Answer##############
Student Answer for last part of Problem 1, part 2. (Questions such as "What happens if . . . ?" and "Can you explain why?" call for answers writen out as text in a markdown cell such as this one, rather than in a code cell.):