For most business users, they've been exposed to Monte Carlo methods in school through proprietary and expensive Excel Add-ins, and then they are set free into a work environment where those expensive add-ins are no longer available. This is usually the end of Monte Carlo for them, but it's quite trivial to do in Python.

Repeated random sampling from a probability distribution in Python

References:

- Wikipedia, Monte Carlo Method
- numpy.random.triangular
`triangular(left, mode, right, size=None)`

- Wikipedia, Triangular distribution
- numpy.random.choice
`choice(a, size=None, replace=True, p=None)`

In [1]:

```
import matplotlib.pyplot as plt
import numpy as np
# set a seed for reproducibility
np.random.seed(42)
```

`np.random.triangular`

and a discrete distribution using `np.random.choice`

.

In a business setting, when dealing with uncertainty, domain experts will phrase their assessment of probabilities using terms like 'low-base-high case' or 'bull-base-bear case' (if you're in finance). In other words, they think about the extreme upper and lower possibilities and a 'most likely' case. This can be translated into a triangular distribution and can be generated quickly in `numpy`

:

In [2]:

```
np.random.triangular(left=1, mode=3, right=5, size=10)
```

Out[2]:

array([2.73098843, 4.37207839, 3.53574303, 3.20814841, 2.11720594, 2.11711958, 1.68166627, 3.96530641, 3.2136406 , 3.4717921 ])

In [3]:

```
%%timeit
samples = np.random.triangular(left=1, mode=3, right=5, size=10**7)
plt.hist(samples, bins=500)
```

435 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10M samples takes about ~500 ms or about half a second.

In [4]:

```
choices = [1, 3, 5]
probabilities = [.1, .8, .1] # must sum to 1
np.random.choice(a=choices, size=10, replace=True, p=probabilities)
```

Out[4]:

array([3, 3, 3, 3, 1, 3, 5, 3, 3, 3])

In [6]:

```
%%timeit
discrete_samples = np.random.choice(a=choices, size=10**6, replace=True, p=probabilities)
plt.hist(discrete_samples, bins=3)
```

46.2 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

`np.random.choice`

takes significantly longer at 13 seconds at 10M samples, so we run it with 1M samples for now just to demo it.

Here's a simple example from corporate finance on when you'll use a Monte Carlo method.

You are asked to estimate the potential costs that your company will incur if they decide to invest in a new project. We have our engineers estimate how long it'll take and how many resources they think they might need, and your job is to estimate the cost of the project.

There are three variables in determining the cost of the project:

- How many people it will take
- How long it will take

In [31]:

```
people = np.random.choice(a=[1,2,3], size=10**7, replace=True, p=[.2,.6,.2])
time = np.random.choice(a=[1,3,5], size=10**7, replace=True, p=[.2, .7, .1])
labor_rate = 50
```

In [32]:

```
cost = people * time * labor_rate
```

In [33]:

```
fig = plt.hist(cost, bins=20)
```

In [34]:

```
np.mean(cost)
```

Out[34]:

279.898715

In [35]:

```
np.median(cost)
```

Out[35]:

300.0

In [36]:

```
np.std(cost)
```

Out[36]:

143.475855429925

And you can get confidence intervals:

In [37]:

```
# 80% confint
(np.percentile(cost, 10), np.percentile(cost, 90))
```

Out[37]:

(100.0, 450.0)

Now let's say that we want to add another variable - revenue. We ask our revenue analysts and they think the project could bring in either \$10 or \$15 per unit (it's a 50-50 chance of either) and volumes will be around 1800 units, with a best case of 2400 and worst case 1000.

We'll model the price assumption as a discrete distribution and volumes as a triangular distribution:

In [86]:

```
price = [10, 15]
price_probabilities = [0.5, 0.5]
```

In [87]:

```
prices = np.random.choice(price, size=10**7, replace=True, p=price_probabilities)
volumes = np.random.triangular(left=1000, mode=1800, right=2400, size=10**7)
# Calculate the revenues
revenue = prices * volumes
```

In [88]:

```
fig = plt.hist(revenue, bins=100)
```

We can run our statistics again:

In [89]:

```
np.mean(revenue)
```

Out[89]:

21666.70489902372

In [90]:

```
np.median(revenue)
```

Out[90]:

20705.70824132701

In [91]:

```
np.std(revenue)
```

Out[91]:

5669.020124454094

In [92]:

```
(np.percentile(revenue, 10), np.percentile(revenue, 90))
```

Out[92]:

(14733.504705181971, 29852.175758982165)

In [93]:

```
margin = revenue - cost
```

In [94]:

```
fig = plt.hist(margin, bins=300)
```

In [95]:

```
np.mean(margin)
```

Out[95]:

21386.806184023717

In [96]:

```
np.median(margin)
```

Out[96]:

20427.216630348783

In [97]:

```
np.std(margin)
```

Out[97]:

5670.856944173494

In [98]:

```
# 80% confint
(np.percentile(margin, 10), np.percentile(margin, 90))
```

Out[98]:

(14451.087496083595, 29574.72188314046)

In [ ]:

```
```