we calculate the confidence interval in the mean with boostrap and compare to the analytical expression
with this workflow we all provide an interactive plot demonstration with matplotlib and ipywidget packages
Confidence intervals are the uncertainty in a sample statistic or model parameter
for uncertainty in the sample mean we have:
center on the sample proportion, $\hat{p}$
the standard error in the proportion for the dispersion (spread)
Student's t distributed for small samples and Gaussian distributed for large sample sizes
The analytical form for small samples ($n \lt 30$) is:
\begin{equation} CI: \hat{p} \pm t_{\frac{\alpha}{2},n-1} \times \frac {\sqrt{p(1-p)}}{\sqrt{n}} \end{equation}where the sampling distribution of the proportion is student's t distributed with number of samples, $n$ - 1, degrees of freedom and $\alpha$ is the signficance level divded by 2 for the two tails.
When the number of samples is large ($n \ge 30$) then the analytical form converges to Gaussian distributed:
\begin{equation} CI: \hat{p} \pm N_{\frac{\alpha}{2}} \times \frac {\sqrt{p(1-p)}}{\sqrt{n}} \end{equation}Uncertainty in the sample statistics
Would it be useful to know the uncertainty in these statistics due to limited sampling?
Bootstrap is a method to assess the uncertainty in a sample statistic by repeated random sampling with replacement.
Assumptions
Limitations
The Bootstrap Approach (Efron, 1982)
Statistical resampling procedure to calculate uncertainty in a calculated statistic from the data itself.
Extremely powerful - could calculate uncertainty in any statistic! e.g. P13, skew etc.
Steps:
assemble a sample set, must be representative, reasonable to assume independence between samples
optional: build a cumulative distribution function (CDF)
For $\ell = 1, \ldots, L$ realizations, do the following:
For $i = \alpha, \ldots, n$ data, do the following:
Calculate a realization of the sammary statistic of interest from the $n$ samples, e.g. $m^\ell$, $\sigma^2_{\ell}$. Return to 3 for another realization.
Compile and summarize the $L$ realizations of the statistic of interest.
This is a very powerful method. Let's try it out and compare the result to the analytical form of the confidence interval for the sample mean.
Provide an example and demonstration for:
Here's the steps to get setup in Python with the GeostatsPy package:
The following code loads the required libraries.
%matplotlib inline
from ipywidgets import interactive # widgets and interactivity
from ipywidgets import widgets
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox
import matplotlib.pyplot as plt # plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
plt.rc('axes', axisbelow=True) # set axes and grids in the background for all plots
import numpy as np # working with arrays
import pandas as pd # working with DataFrames
import seaborn as sns # for matrix scatter plots
from scipy.stats import triang # parametric distributions
from scipy.stats import binom
from scipy.stats import norm
from scipy.stats import uniform
from scipy.stats import triang
from scipy.stats import t
from scipy import stats # statistical calculations
import random # random drawing / bootstrap realizations of the data
from matplotlib.gridspec import GridSpec # nonstandard subplots
import math # square root operator
These are some of the functions from GeostatsPy required by the new program. We will declare them here and then in the future integrate the new indicator kriging program into the package properly.
def add_grid():
plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks
This is an interactive method to:
select a parametric distribution
select the distribution parameters
select the number of samples and visualize the synthetic dataset distribution
# parameters for the synthetic dataset
bins = np.linspace(0,1000,1000)
# interactive calculation of the sample set (control of source parametric distribution and number of samples)
l = widgets.Text(value=' Simple Boostrap Demonstration, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))
a = widgets.IntSlider(min=0, max = 100, value = 2, step = 1, description = '$n_{red}$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
a.style.handle_color = 'red'
b = widgets.IntSlider(min=0, max = 100, value = 3, step = 1, description = '$n_{green}$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
b.style.handle_color = 'green'
c = widgets.IntSlider(min=1, max = 16, value = 3, step = 1, description = '$L$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
c.style.handle_color = 'gray'
ui = widgets.HBox([a,b,c],) # basic widget formatting
ui2 = widgets.VBox([l,ui],)
def f_make(a, b, c): # function to take parameters, make sample and plot
red_freq = make_data(a, b, c)
labels = ['Red', 'Green']
nrows = int(np.sqrt(c)+0.9); ncols = int(c / nrows + 0.9)
print(nrows,ncols)
plt.clf()
for i in range(0, c):
plt.subplot(ncols,nrows,i + 1)
draw = [red_freq[i],a + b - red_freq[i]]
plt.grid(zorder=0, color='black', axis = 'y', alpha = 0.2); plt.ylim(0,a + b);
plt.ylabel('Frequency'); plt.xlabel('Balls Drawn')
plt.yticks(np.arange(0,a + b + 1,max(1,round((a+b)/10))))
barlist = plt.bar(labels,draw,edgecolor = "black",linewidth = 1,alpha = 0.8); plt.title('Realization #' + str(i+1),zorder = 1)
barlist[0].set_color('r'); barlist[1].set_color('g')
plt.subplots_adjust(left=0.0, bottom=0.0, right=ncols, top=nrows + 0.2 * nrows, wspace=0.2, hspace=0.2)
plt.show()
def make_data(a, b, c): # function to check parameters and make sample
prop_red = np.zeros(c)
for i in range(0, c):
prop_red[i] = np.random.multinomial(a+b,[a/(a+b),b/(a+b)], size = 1)[0][0]
return prop_red
# connect the function to make the samples and plot to the widgets
interactive_plot = widgets.interactive_output(f_make, {'a': a, 'b': b, 'c': c})
interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating
drawing red and green balls from a hat with replacement to access uncertainty in the proportion
interactive plot demonstration with ipywidget, matplotlib packages
Let's simulate bootstrap, resampling with replacement from a hat with $n_{red}$ and $n_{green}$ balls
$n_{red}$: number of red balls in the sample (placed in the hat)
$n_{green}$: number of green balls in the sample (placed in the hat)
$L$: number of bootstrap realizations
display(ui2, interactive_plot) # display the interactive plot
VBox(children=(Text(value=' Simple Boostrap Demonstration, Michael Pyrcz, Assoc…
Output(outputs=({'output_type': 'stream', 'text': '2 2\n', 'name': 'stdout'}, {'output_type': 'display_data', …
Now instead of looking at each bootstrap result, let's make many and summarize with:
box and whisker plot of the red and green ball frequencies
histograms of the red and green ball frequencies.
# parameters for the synthetic dataset
bins = np.linspace(0,1000,1000)
# interactive calculation of the sample set (control of source parametric distribution and number of samples)
l2 = widgets.Text(value=' Confidence Interval for Proportions, Analytical and Bootstrap Demonstration, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))
a2 = widgets.IntSlider(min=0, max = 100, value = 20, step = 1, description = '$n_{red}$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
a2.style.handle_color = 'red'
b2 = widgets.IntSlider(min=0, max = 100, value = 30, step = 1, description = '$n_{green}$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
b2.style.handle_color = 'green'
c2 = widgets.IntSlider(min=5, max = 1000, value = 1000, step = 1, description = '$L$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
c2.style.handle_color = 'gray'
alpha = widgets.FloatSlider(min=0.01, max = 0.40, value = 0.05, step = 0.01, description = r'$\alpha$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
alpha.style.handle_color = 'gray'
uib = widgets.HBox([a2,b2,c2,alpha],) # basic widget formatting
uib2 = widgets.VBox([l2,uib],)
def s_make(a, b, c, alpha): # function to take parameters, make sample and plot
dof = a + b - 1
red_freq = make_data(a, b, c)
pred = red_freq/(a+b)
red_prop = (a / (a+b))
red_SE = math.sqrt((red_prop * (1.0 - red_prop)) / (a+b))
green_freq = (a + b) - red_freq
pgreen = green_freq/(a+b)
green_prop = (b / (a+b))
green_SE = math.sqrt((green_prop * (1.0 - green_prop)) / (a+b))
prop_red = red_freq / (a + b)
prop_green = green_freq / (a + b)
labels = ['Red Balls', 'Green Balls']
bins = np.linspace(0,a + b, a + b)
fig = plt.figure(constrained_layout=False)
gs = GridSpec(3, 2, figure=fig)
ax1 = fig.add_subplot(gs[:, 0])
boxplot = ax1.boxplot([pred,pgreen],labels = labels, notch = True, sym = '+',patch_artist=True)
colors = ['red','green']
for patch, color in zip(boxplot['boxes'], colors):
patch.set_facecolor(color)
for patch, color in zip(boxplot['medians'], colors):
patch.set_color('black')
ax1.set_ylim([0,1])
ax1.grid(zorder=0, color='black', axis = 'y', alpha = 0.2)
ax1.set_ylabel('Proportion of Balls'); ax1.set_xlabel('Ball Color');ax1.set_title('Bootstrap Uncertainty - Proportion Distributions')
ax1.grid(True, which='major',axis='y',linewidth = 1.0); ax1.grid(True, which='minor',axis='y',linewidth = 0.2) # add y grids
ax1.tick_params(which='major',length=7); ax1.tick_params(which='minor', length=4)
ax1.xaxis.set_minor_locator(AutoMinorLocator()); ax1.yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks
cumul_prob = np.linspace(0.0,1.0,100)
if a <= 30 or b <= 30:
red_prop_values = t.ppf(cumul_prob, dof)
red_lower = t.ppf(alpha/2, dof); red_upper = t.ppf(1-alpha/2, dof)
else:
red_prop_values = norm.ppf(cumul_prob)
red_lower = norm.ppf(alpha/2); red_upper = norm.ppf(1-alpha/2)
red_prop_values = red_prop_values * red_SE + red_prop
red_lower = red_lower * red_SE + red_prop
red_upper = red_upper * red_SE + red_prop
cumul_prob = np.linspace(0.01,0.99,100)
if a <= 30 or b <= 30:
green_prop_values = t.ppf(cumul_prob, dof)
green_lower = t.ppf(alpha/2, dof); green_upper = t.ppf(1-alpha/2, dof)
else:
green_prop_values = norm.ppf(cumul_prob)
green_lower = norm.ppf(alpha/2); green_upper = norm.ppf(1-alpha/2)
green_prop_values = green_prop_values * green_SE + green_prop
green_lower = green_lower * green_SE + green_prop
green_upper = green_upper * green_SE + green_prop
ax2 = fig.add_subplot(gs[0, 1])
ax2.hist(prop_red,cumulative = True, density = True, alpha=0.7,color="red",edgecolor="black",linewidth=2,bins = np.linspace(0,1,50), label = 'Bootstrap')
ax2.plot([red_lower,red_lower],[0,1],color='black',linewidth=2,linestyle='--',label='Lower/Upper')
ax2.plot([red_upper,red_upper],[0,1],color='black',linewidth=2,linestyle='--')
ax2.plot([red_prop,red_prop],[0,1],color='black',linewidth=3,label='Exp.')
ax2.set_title('Uncertainty in Proportion of Red Balls'); ax2.set_xlabel('Proportion of Red Balls'); ax2.set_ylabel('Cumulative Probability')
ax2.set_xlim([0,1]); ax2.set_ylim([0,1])
ax2.plot(red_prop_values, cumul_prob, color = 'black', linewidth = 2, label = 'Analytical')
ax2.legend()
ax3 = fig.add_subplot(gs[1, 1])
ax3.hist(prop_green,cumulative = True, density = True, alpha=0.7,color="green",edgecolor="black",linewidth=2,bins = np.linspace(0,1,50), label = 'Bootstrap')
ax3.plot([green_lower,green_lower],[0,1],color='black',linewidth=2,linestyle='--',label='Lower/Upper')
ax3.plot([green_upper,green_upper],[0,1],color='black',linewidth=2,linestyle='--')
ax3.plot([green_prop,green_prop],[0,1],color='black',linewidth=3,label='Exp.')
ax3.set_title('Uncertainty in Proportion of Green Balls'); ax3.set_xlabel('Proportion of Green Balls'); ax3.set_ylabel('Cumulative Probability')
ax3.set_xlim([0,1]); ax3.set_ylim([0,1])
ax3.plot(green_prop_values, cumul_prob, color = 'black', linewidth = 2, label = 'Analytical')
ax3.legend()
ax4 = fig.add_subplot(gs[2, 1])
ax4.hist(prop_green,cumulative = False, density = True, alpha=0.7,color="green",edgecolor="black",linewidth=2, bins = np.linspace(0,1,50), label = 'Bootstrap Prop. Green')
ax4.hist(prop_red,cumulative = False, density = True, alpha=0.7,color="red",edgecolor="black",linewidth=2, bins = np.linspace(0,1,50), label = 'Bootstrap Prop. Red')
ax4.set_title('Confidence Interval in Proportion of Red and Green Balls (Alpha = ' + str(alpha) + ')'); ax3.set_xlabel('Proportion of Green Balls')
ax4.set_xlabel('Proportion of Red and Green Balls'); ax4.set_ylabel('Frequency')
ax4.set_xlim([0,1])
prop_values = np.linspace(0.0,1.0,100)
if a <= 30 and b <= 30:
green_density = t.pdf(prop_values,loc = green_prop, df = dof, scale = green_SE)
else:
green_density = norm.pdf(prop_values,loc = green_prop, scale = green_SE)
ax4.plot(prop_values, green_density, color = 'black', linewidth = 5,zorder=99)
ax4.plot(prop_values, green_density, color = 'green', linewidth = 3, label = 'Analytical Prop. Green',zorder=100)
if a <= 30 and b <= 30:
red_density = t.pdf(prop_values,loc = red_prop, df = dof, scale = red_SE)
else:
red_density = norm.pdf(prop_values,loc = red_prop, scale = red_SE)
ax4.plot(prop_values, red_density, color = 'black', linewidth = 5,zorder=99)
ax4.plot(prop_values, red_density, color = 'red', linewidth = 3, label = 'Analytical Prop. Red',zorder=100)
ax4.fill_between(prop_values, 0, green_density, where = prop_values <= green_lower, facecolor='green', interpolate=True, alpha = 0.9,zorder=101)
ax4.fill_between(prop_values, 0, green_density, where = prop_values >= green_upper, facecolor='green', interpolate=True, alpha = 0.9,zorder=101)
ax4.fill_between(prop_values, 0, red_density, where = prop_values <= red_lower, facecolor='darkred', interpolate=True, alpha = 0.9,zorder=101)
ax4.fill_between(prop_values, 0, red_density, where = prop_values >= red_upper, facecolor='darkred', interpolate=True, alpha = 0.9,zorder=101)
ax4.legend()
plt.subplots_adjust(left=0.0, bottom=0.0, right=2.5, top=3.0, wspace=0.2, hspace=0.3)
plt.show()
# connect the function to make the samples and plot to the widgets
interactive_plot = widgets.interactive_output(s_make, {'a': a2, 'b': b2, 'c': c2, 'alpha': alpha})
interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating
drawing red and green balls from a hat with replacement to access uncertainty in the proportion
run many bootstrap realizations and summarize the results and compare to the analytical sampling distribution for the proportion
interactive plot demonstration with ipywidget, matplotlib packages
Let's simulate bootstrap, resampling with replacement from a hat with $n_{red}$ and $n_{green}$ balls
$n_{red}$: number of red balls in the sample (placed in the hat)
$n_{green}$: number of green balls in the sample (placed in the hat)
$L$: number of bootstrap realizations
$\alpha$: alpha level for the confidence interval (significance level)
and then compare the uncertainty in the proportion of balls to the analytical expression.
display(uib2, interactive_plot) # display the interactive plot
VBox(children=(Text(value=' Confidence Interval for Proportions, Analytical and Bootstrap Demonstration, M…
Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 432x288 with 4 Axes>', 'i…
Let's calculate the confidence interval, given:
$n$: number of samples
$\overline{x}$: sample mean
$s_x$: sample standard deviation
$\alpha$: alpha level for the confidence interval (significance level)
# interactive calculation of the sample set (control of source parametric distribution and number of samples)
l2 = widgets.Text(value=' Confidence Interval for Mean, Analytical Explained, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))
n = widgets.IntSlider(min=1, max = 20, value = 10, step = 1, description = '$n$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
n.style.handle_color = 'red'
smean = widgets.FloatSlider(min=1.0, max = 4.0, value = 2.0, step = 0.11, description = '$\overline{x}$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
smean.style.handle_color = 'green'
sstdev = widgets.FloatSlider(min=0.1, max = 2.0, value = 1.0, step = 0.1, description = '$s_x$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
sstdev.style.handle_color = 'gray'
alpha = widgets.FloatSlider(min=0.01, max = 0.40, value = 0.05, step = 0.01, description = r'$\alpha$',orientation='horizontal',layout=Layout(width='400px', height='20px'),continuous_update=False)
alpha.style.handle_color = 'gray'
uib = widgets.HBox([n,smean,sstdev,alpha],) # basic widget formatting
uib3 = widgets.VBox([l2,uib],)
def ci_make(n,smean,sstdev,alpha): # function to take parameters, make sample and plot
dof = n-1
tscore = np.round(-1*stats.t.ppf(alpha/2.0, df=dof),2)
se = np.round(sstdev/(np.sqrt(n)),2)
smean = np.round(smean,2)
sstdev = np.round(sstdev,2)
alpha = np.round(alpha,2)
xval = np.linspace(0,10,1000)
pdf = stats.t.pdf(xval,df = dof, loc = smean, scale = se)
lower_CI = np.round(smean - tscore*se,2)
upper_CI = np.round(smean + tscore*se,2)
plt.plot(xval,pdf,color='black',lw=2,zorder=100)
plt.vlines(smean,0,stats.t.pdf(smean,df = dof, loc = smean, scale = sstdev/np.sqrt(n)),color='red',ls='--',zorder=10)
plt.vlines(lower_CI,0,stats.t.pdf(lower_CI,df = dof, loc = smean, scale = sstdev/np.sqrt(n)),color='darkred',zorder=10)
plt.vlines(upper_CI,0,stats.t.pdf(lower_CI,df = dof, loc = smean, scale = sstdev/np.sqrt(n)),color='darkred',zorder=10)
plt.fill_between(xval,pdf,where= xval< lower_CI, color='red',alpha=0.7)
plt.fill_between(xval,pdf,where= xval> upper_CI, color='red',alpha=0.7)
add_grid()
vset = +0.4
plt.annotate(r'$CI_{\mu} \rightarrow \overline{x} \pm t_{\left(\frac{\alpha}{2},dof\right)} \times \frac{s_{x}}{\sqrt{n}}$',[6.3,1.2+vset],size=12)
# plt.annotate(r'$CI_{\mu} = $' + str(smean) + r'$ \pm $' + str(tscore) + r' $\times$ ',[6.3,1.1],size=12)
plt.annotate(r'$\overline{x} = $' + str(smean),[6.4,1.35+vset],color='red'); plt.plot([6.8,7.0],[1.33+vset,1.27+vset],color='red')
plt.annotate(r'$\alpha = $' + str(alpha),[7.4,1.355+vset],color='red'); plt.plot([7.8,7.75],[1.33+vset,1.26+vset],color='red')
plt.annotate(r'$s_{x} = $' + str(sstdev),[8.6,1.355+vset],color='red'); plt.plot([8.85,8.75],[1.33+vset,1.27+vset],color='red')
plt.annotate(r'$dof = $' + str(dof),[7.4,1.05+vset],color='red'); plt.plot([7.9,7.8],[1.15+vset,1.10+vset],color='red')
plt.annotate(r'$n = $' + str(n),[8.8,1.05+vset],color='red'); plt.plot([8.8,8.9],[1.15+vset,1.10+vset],color='red')
vset = +0.2
plt.plot([8.7,9.1],[1.115+vset,1.115+vset],lw=0.5,color='black')
plt.annotate(r'$CI_{\mu} \rightarrow $' + str(smean) + r'$ \pm $' + str(tscore) + r' $\times$ ',[6.3,1.1+vset],size=12)
plt.annotate(sstdev,[8.70,1.13+vset]); plt.annotate(np.round(np.sqrt(n),2),[8.68,1.07+vset])
#plt.annotate(r'$CI_{\mu} = $' + str(smean) + '$ \pm t_{\left(\frac{alpha}{2},dof\right)} \times \frac{\sigma_{s}}{\sqrt{n}}$',[6.3,1.1],size=12)
vset = +0.1
plt.annotate(r'$CI_{\mu} \rightarrow $' + str(smean) + r'$ \pm $' + str(tscore) + r' $\times$ ' + str(se),[6.3,0.95+vset],size=12)
plt.annotate('statistic',[7.1,0.65+vset],rotation=270.0,ha='left',color='red'); plt.plot([7.23,7.23],[0.86+vset,0.93+vset],color='red')
plt.annotate('-score',[7.8,0.7+vset],rotation=270.0,ha='left',color='red'); plt.plot([7.93,7.93],[0.86+vset,0.93+vset],color='red')
plt.annotate('standard error',[8.8,0.48+vset],rotation=270.0,ha='left',color='red'); plt.plot([8.93,8.93],[0.86+vset,0.93+vset],color='red')
plt.annotate(r'$CI_{\mu} \rightarrow [$' + str(lower_CI) + ' , ' + str(upper_CI) + r'$]$',[6.3,0.5+vset],size=12)
plt.annotate('lower interval = ' + str(lower_CI),[lower_CI-0.1,stats.t.pdf(lower_CI,df = dof, loc = smean, scale = sstdev/np.sqrt(n))+0.05],rotation=90.0)
plt.annotate('upper interval = ' + str(upper_CI),[upper_CI-0.1,stats.t.pdf(upper_CI,df = dof, loc = smean, scale = sstdev/np.sqrt(n))+0.05],rotation=90.0)
plt.annotate('$\overline{x} = $' + str(smean),[smean-0.1,stats.t.pdf(smean,df = dof, loc = smean, scale = sstdev/np.sqrt(n))+0.05],rotation=90.0)
#plt.annotate(r'$F^{-1}_x($' + str(np.round(P,2)) + '$)$ = ' + str(np.round(x,2)),xy=[x+0.003,0.08],rotation=270.0)
plt.xlim([0,10]); plt.ylim([0,2.0]); plt.ylabel('Density'); plt.xlabel('Value'); plt.title('Analytical Confidence Interval for Population Mean')
plt.subplots_adjust(left=0.0, bottom=0.0, right=1.1, top=1.2, wspace=0.2, hspace=0.3); plt.show()
# connect the function to make the samples and plot to the widgets
interactive_plot3 = widgets.interactive_output(ci_make, {'n':n,'smean':smean,'sstdev':sstdev,'alpha':alpha})
interactive_plot3.clear_output(wait = True) # reduce flickering by delaying plot updating
display(uib3, interactive_plot3) # display the interactive plot
VBox(children=(Text(value=' Confidence Interval for Mean, Analytical Explained, Michael Pyrcz, Professor, …
Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 432x288 with 1 Axes>', 'i…
Some observations:
sampling distribution for proportions become discrete with too few samples, as only $n$ cases are possible
enough bootstrap realizations are required for stable statistics
the analytical sampling distribution for the uncertainty in the sample proportion matches the results from bootstrap
This was a simple demonstration of interactive plots in Jupyter Notebook Python with the ipywidgets and matplotlib packages.
I have many other demonstrations on data analytics and machine learning, e.g. on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy.
I hope this was helpful,
Michael
Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions
With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development.
For more about Michael check out these links:
I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.
Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you!
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.
I'm always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin