In this notebook we will learn:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
3 > 1
type(3 > 1)
True
true
3 = 3
3 == 3.0
10 != 2
x = 14
y = 3
x > 15
12 < x
x < 20
12 < x < 20
10 < x-y < 13
x > 13 and y < 3.14159
pets = make_array('cat', 'cat', 'dog', 'cat', 'dog', 'rabbit')
pets == 'cat'
1 + 1 + 0 + 1 + 0 + 0
sum(make_array(True, True, False, True, False, False))
sum(pets == 'dog')
np.count_nonzero(pets == 'dog')
x = np.arange(20, 31)
x > 28
Python has a for
. The stucture is like this:
for variable in list or array:
body of loop
rainbow = make_array('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')
for color in rainbow:
print(color)
for thing in rainbow:
print(thing)
num_array = np.arange(1, 3.25, 0.25)
## This for-loop is meaningless, don't try to figure out what's being computed
## we just want to demonstrate that a for-loop can involve multiple steps
for i in num_array:
i2 = i**2
i3 = i2 - 1
i4 = i3*(1.09)
print(i4)
for k in np.arange(11):
print(k**3)
num_list = [1, 2, 3, 4, 5, 6, 7, 8]
for k in num_list:
print((k - 1)**0.5)
We'll see that appending an array can be a good way to keep track to the results of multiple simulations.
first = np.arange(4)
second = np.arange(10, 17)
second
np.append(first, 6)
first
np.append(first, second)
first
second
squares = make_array() # an empty array
num_array = np.arange(11)
for i in num_array:
squares = np.append(squares, i**2)
squares
Let's play a game: we each roll a die.
If my number is bigger: you pay me a dollar.
If they're the same: we do nothing.
If your number is bigger: I pay you a dollar.
Steps:
The np.random.choice
function can help here.
die_faces = np.arange(1, 7)
die_faces
np.random.choice(die_faces)
np.random.choice(die_faces, 10)
# Work in progress
def one_round(my_roll, your_roll):
if my_roll > your_roll:
return 1
one_round(4, 3)
one_round(2, 6)
# Final correct version
def one_round(my_roll, your_roll):
if my_roll > your_roll:
return 1
elif your_roll > my_roll:
return -1
elif your_roll == my_roll:
return 0
one_round(1, 1)
one_round(6, 5)
one_round(7, -1)
def simulate_one_round():
my_roll = np.random.choice(die_faces)
your_roll = np.random.choice(die_faces)
return one_round(my_roll, your_roll)
simulate_one_round()
results = make_array()
results
results = np.append(results, simulate_one_round())
results
game_outcomes = make_array()
for i in np.arange(5):
game_outcomes = np.append(game_outcomes, simulate_one_round())
game_outcomes
game_outcomes = make_array()
for i in np.arange(10000):
game_outcomes = np.append(game_outcomes, simulate_one_round())
game_outcomes
len(game_outcomes)
results = Table().with_column('My winnings', game_outcomes)
results
results.group('My winnings').barh('My winnings')
game_outcomes = make_array()
for i in np.arange(10000):
game_outcomes = np.append(game_outcomes, simulate_one_round())
results = Table().with_column('My winnings', game_outcomes)
results.group('My winnings').barh('My winnings')
sum(results.column(0))
# Bonus question: This simulation is relatively simple.
# Can you find a way to run it without using a for loop?
my_rolls = np.random.choice(np.arange(1,7), size = 10000)
your_rolls = np.random.choice(np.arange(1,7), size = 10000)
results = Table().with_columns("Mine", my_rolls, "Yours", your_rolls)
results = results.with_column("Results", results.apply(one_round, "Mine", "Yours"))
results.group("Results")
results.group("Results").barh("Results")
If 100 people individually flipped their own fair coin at the same time (or one very bored person flipped a fair coin 100 times), would it be reasonable if 40 or fewer of them came up heads?
coin = make_array('heads', 'tails')
sum(np.random.choice(coin, 100) == 'heads')
# Simulate one outcome
def num_heads():
return sum(np.random.choice(coin, 100) == 'heads')
# Decide how many times you want to repeat the experiment
repetitions = 10000
# Simulate that many outcomes
outcomes = make_array()
for i in np.arange(repetitions):
outcomes = np.append(outcomes, num_heads())
heads = Table().with_column('Heads', outcomes)
heads.hist(bins = np.arange(29.5, 70.6), right_end = 40)
heads = Table().with_column('Heads', outcomes)
heads.hist(bins = np.arange(29.5, 70.6), right_end = 40)
They yellow section; how many is that?
sum(heads.column(0)<=40)
sum(outcomes <=40)
Then what proportion is that?
290/10000
What interval captures the middle 95% of these outcomes?
np.percentile(outcomes, make_array(2.5, 97.5))
On the game show, Let's Make a Deal, one of the more popular games was a simple guessing game involving three doors. One door would hide a desireable prize (an expensive vacation, a new car, or something of similar value). The other two doors would hide a fake prize, often a goat. The way the game was played was simple:
The mathematical/probability/statistical question is this: should the player switch doors?
To put it another way, which player strategy has the higher likelihood of winning, picking a door and sticking with it, or picking a door and automatically switching once another door has been opened?
Strategy 1: The pick & stick (pick a door and don't switch when given the change)
Strategy 2: The pick & switch (pick a door, but automatically switch to the other when it's offered)
Let's use simulations to decide which strategy is better.
doors = make_array('car', 'first goat', 'second goat')
goats = make_array('first goat', 'second goat')
def other_goat(a_goat):
if a_goat == 'first goat':
return 'second goat'
elif a_goat == 'second goat':
return 'first goat'
def monty_hall():
contestant_choice = np.random.choice(doors)
if contestant_choice == 'first goat':
monty_choice = 'second goat'
remaining_door = 'car'
elif contestant_choice == 'second goat':
monty_choice = 'first goat'
remaining_door = 'car'
elif contestant_choice == 'car':
monty_choice = np.random.choice(goats)
remaining_door = other_goat(monty_choice)
return [contestant_choice, monty_choice, remaining_door]
games = Table(['Strategy 1 Prize', 'Revealed', 'Strategy 2 Prize'])
reps = 10000
for i in range(reps):
games.append(monty_hall())
games
sum(games.column('Strategy 1 Prize')=='car')/reps
sum(games.column('Strategy 2 Prize')=='car')/reps