In this notebook we compute the Standard Error for Cross Validation based on
https://www.stat.cmu.edu/~ryantibs/advmethods/notes/errval.pdf
CMU Notes from Ryan Tibshirani
https://www.stat.cmu.edu/~ryantibs/
Note: Tibshirani's father at Stanford worked on 632bootstrap
"Improvements on Cross-Validation: The .632+ Bootstrap Method." Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703.
The standard error is the standard deviation of the Student t-distribution. T-distributions are slightly different from Gaussian, and vary depending on the size of the sample.
!pip install git+https://github.com/pattersonconsulting/ml_tools.git
Collecting git+https://github.com/pattersonconsulting/ml_tools.git Cloning https://github.com/pattersonconsulting/ml_tools.git to /tmp/pip-req-build-4im4qaw3 Running command git clone -q https://github.com/pattersonconsulting/ml_tools.git /tmp/pip-req-build-4im4qaw3 Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from ml-valuation==0.0.1) (1.1.5) Requirement already satisfied: sklearn in /usr/local/lib/python3.7/dist-packages (from ml-valuation==0.0.1) (0.0) Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from ml-valuation==0.0.1) (3.2.2) Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from ml-valuation==0.0.1) (1.19.5) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->ml-valuation==0.0.1) (2.8.2) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->ml-valuation==0.0.1) (2.4.7) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->ml-valuation==0.0.1) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->ml-valuation==0.0.1) (1.3.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->ml-valuation==0.0.1) (1.15.0) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->ml-valuation==0.0.1) (2018.9) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sklearn->ml-valuation==0.0.1) (0.22.2.post1) Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->ml-valuation==0.0.1) (1.4.1) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->ml-valuation==0.0.1) (1.1.0) Building wheels for collected packages: ml-valuation Building wheel for ml-valuation (setup.py) ... done Created wheel for ml-valuation: filename=ml_valuation-0.0.1-py3-none-any.whl size=8800 sha256=7a29616033f2dd858fbc0727f8b761e3d7c60576a22dc427c0d6a5d80e12abd1 Stored in directory: /tmp/pip-ephem-wheel-cache-o04g75e7/wheels/ce/52/e8/5f5de6a3a97eca5d2f9e453ecafb0f88f99054a1f2601f637e Successfully built ml-valuation Installing collected packages: ml-valuation Successfully installed ml-valuation-0.0.1
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
#from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import KFold, cross_val_score, GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.utils import resample
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import average_precision_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
from sklearn.metrics import roc_curve, auc, confusion_matrix
from matplotlib import pyplot
import ml_valuation
from ml_valuation import model_valuation
from ml_valuation import model_visualization
arr_X, arr_y = load_breast_cancer(return_X_y=True)
#X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
print("X: " + str(arr_X.shape))
#print("X_test: " + str(X.shape))
X: (569, 30)
print(arr_X)
[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01] [2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02] [1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02] ... [1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02] [2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01] [7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]]
unique, counts = np.unique(arr_y, return_counts=True)
dict(zip(unique, counts))
{0: 212, 1: 357}
from sklearn.model_selection import KFold, train_test_split, RandomizedSearchCV, StratifiedKFold
# fit a model
classifier_kfold_LR = LogisticRegression(solver='newton-cg')
k = 10
cv = StratifiedKFold( n_splits=k )
stats = list()
X = pd.DataFrame(arr_X)
y = pd.DataFrame(arr_y)
#for train_index, test_index in cv.split(X, y):
for i, (train_index, test_index) in enumerate(cv.split(X, y)):
# convert the data indexes into references
Xtrain, Xtest = X.iloc[train_index], X.iloc[test_index]
ytrain, ytest = y.iloc[train_index], y.iloc[test_index]
print("Running CV Fold-" + str(i))
#print(Xtrain.shape)
#print(Xtest.shape)
#x_train_scaled, x_test_scaled, y_train_scaled, y_test_scaled = train_test_split(df_full_scaled, y_train_full, test_size=1 - train_ratio)
# classifier_kfold_LR.fit( X.iloc[ train_index ], y.iloc[ train_index ])
# fit the model on the training data (Xtrain) and labels (ytrain)
classifier_kfold_LR.fit( Xtrain, ytrain.values.ravel() )
# now get the probabilites of the predictions for the text input (data: Xtest, labels: ytest)
#probas_ = classifier_kfold_LR.predict_proba( Xtest )
#print( "prediction probabilities: " + str(yhat.shape) )
#prediction_est_prob = probas_[:, 1]
y_pred = classifier_kfold_LR.predict(Xtest)
accuracy_fold = accuracy_score(ytest, y_pred)
#scmtrx_lr_full_testset = model_valuation.standard_confusion_matrix_for_top_ranked_percent(y_test_scaled, yhat, 0.5, 1.0)
stats.append(accuracy_fold)
print("Accuracy: " + str(accuracy_fold))
print("-----")
mean_score = np.mean(stats)
print("\n\nAverage Accuracy Across All Folds: " + str("{:.4f}".format(mean_score)))
std_dev_score = np.std(stats)
print("\n\nSTD DEV: " + str(std_dev_score))
standard_error_score = (1/np.sqrt(k)) * std_dev_score
print("\n\nStandard Error (Accuracy) Across All Folds: ( " + str("{:.4f}".format(standard_error_score)) + ")")
# https://en.wikipedia.org/wiki/Standard_error#:~:text=Assumptions%20and%20usage%5Bedit%5D
# https://en.wikipedia.org/wiki/1.96
# 95% of values will lie within ±1.96
ci_95 = 1.96 * standard_error_score
print("CI Ranges 95%:")
low_end_range = mean_score - ci_95
high_end_range = mean_score + ci_95
print("High: " + str(high_end_range))
print("Low : " + str(low_end_range))
Running CV Fold-0 Accuracy: 0.9824561403508771 ----- Running CV Fold-1
/usr/local/lib/python3.7/dist-packages/scipy/optimize/linesearch.py:314: LineSearchWarning: The line search algorithm did not converge warn('The line search algorithm did not converge', LineSearchWarning) /usr/local/lib/python3.7/dist-packages/sklearn/utils/optimize.py:204: UserWarning: Line Search failed warnings.warn('Line Search failed')
Accuracy: 0.9122807017543859 ----- Running CV Fold-2 Accuracy: 0.9298245614035088 ----- Running CV Fold-3 Accuracy: 0.9473684210526315 ----- Running CV Fold-4 Accuracy: 0.9824561403508771 ----- Running CV Fold-5 Accuracy: 0.9824561403508771 ----- Running CV Fold-6 Accuracy: 0.9298245614035088 ----- Running CV Fold-7 Accuracy: 0.9473684210526315 ----- Running CV Fold-8 Accuracy: 0.9649122807017544 ----- Running CV Fold-9 Accuracy: 0.9642857142857143 ----- Average Accuracy Across All Folds: 0.9543 STD DEV: 0.023770661464399972 Standard Error (Accuracy) Across All Folds: ( 0.0075) CI Ranges 95%: High: 0.969056516887071 Low : 0.9395900996542824
These results are comparable to bootstrap632 results, for reference, on the same dataset / classifier combination
Notable Links