#!/usr/bin/env python # coding: utf-8 # In[1]: ### METHOD - 1 ##### Train/Test Split # We train our model on the Train set and Evaluate on the Test set. # Typical fractions are 7:3 for train and test respectively. # Sklearn provides a method: train_test_split(X, Y, test_size=0.3, random_state=10) for this purpose # random_state is required to reproducing the results. # Very Fast # In[2]: ### METHOD - 2 ##### K-Fold Cross Validation # We split our dataset into K folds (typical values are 3, 5, 10) # Algorithm is trained on K-1 folds, where 1 fold is held back and testing happens on that held back fold. # After running cross-validation you end up with k different performance scores that you can summarize # using a mean and a standard deviation. # Sklean provides KFold(n_splits=5, random_state=10) method for this purpose # In[3]: ### METHOD - 3 ##### Leave One Out Cross Validation # We make 1 Fold of our dataset containing all the N datapoints. # We train our algorithm on N-1 points and predicts tha left out point. # N different performance scores that you can summarize # Sklearn provides LeaveOneOut() method for this purpose # Computationaly intensive # In[4]: ### METHOD - 4 ##### Repeated Random Train-Test Splits # Inspired by K-fold. # It's simply METHOD - 1 that is run N number of times with different random seed split of data. # Sklearn provides ShuffleSplit(n_splits=5, test_size=0.3, random_state=10) for this purpose