The predictor generated by Brainome is capable of being used by the command line interface (CLI).
This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start
The data sets are:
!python3 -m pip install brainome --quiet
!brainome -version
import urllib.request as request
response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')
response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
brainome v1.006-18-beta
-rw-r--r-- 1 andys staff 858B Sep 20 16:58 titanic_predict.csv
-rw-r--r-- 1 andys staff 56K Sep 20 16:58 titanic_train.csv
-rw-r--r-- 1 andys staff 5.7K Sep 20 16:58 titanic_validate.csv
!brainome titanic_train.csv -rank -y -o predictor_104.py -modelonly -q
print("The predictor filename is predictor_104.py")
%ls -lh predictor_104.py
# Preview predictor
%pycat predictor_104.py
WARNING: Could not detect a GPU. Neural Network generation will be slow. The predictor filename is predictor_104.py -rw-r--r-- 1 andys staff 38K Sep 20 17:00 predictor_104.py
Brainome predictors are really short and sweet. They just validate and classify data.
While the predictor source code is portable, it does require numpy to run and optionally scipy to generate the confusion matrices.
!python3 predictor_104.py --help
usage: predictor_104.py [-h] [-validate] [-headerless] [-json] [-trim] csvfile Predictor trained on ['titanic_train.csv'] positional arguments: csvfile CSV file containing test set (unlabeled). optional arguments: -h, --help show this help message and exit -validate Validation mode. csvfile must be labeled. Output is classification statistics rather than predictions. -headerless Do not treat the first line of csvfile as a header. -json report measurements as json -trim If true, the prediction will not output ignored columns.
The validate function takes a csv data set identical to the training data set and, with the -validate
parameter, compares outcomes.
!python3 predictor_104.py -validate titanic_validate.csv
Classifier Type: Random Forest System Type: 2-way classifier Accuracy: Best-guess accuracy: 61.25% Model accuracy: 80.00% (64/80 correct) Improvement over best guess: 18.75% (of possible 38.75%) Model capacity (MEC): 17 bits Generalization ratio: 3.62 bits/bit Confusion Matrix: Actual | Predicted ---------------------------- died | 45 4 survived | 12 19 Accuracy by Class: target | TP FP TN FN TPR TNR PPV NPV F1 TS -------- | -- -- -- -- ------- ------- ------- ------- ------- ------- died | 45 12 19 4 91.84% 61.29% 78.95% 82.61% 84.91% 73.77% survived | 19 4 45 12 61.29% 91.84% 82.61% 78.95% 70.37% 54.29%
The predictor can classify a similar to training/validation data set sans target column.
It will generate a complete data set with the "Prediction" column appended.
!python3 predictor_104.py titanic_predict.csv > classifications_104.csv
print('Viewing classification predictions.')
%pip install pandas --quiet
import pandas as pd
pd.read_csv('classifications_104.csv')
Viewing classification predictions.
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Note: you may need to restart the kernel to use updated packages.
PassengerId | Cabin_Class | Name | Sex | Age | Sibling_Spouse | Parent_Children | Ticket_Number | Fare | Cabin_Number | Port_of_Embarkation | Prediction | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 881 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | NaN | S | survived |
1 | 882 | 3 | Markun, Mr. Johann | male | 33.0 | 0 | 0 | 349257 | 7.8958 | NaN | S | died |
2 | 883 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | NaN | S | died |
3 | 884 | 2 | Banfield, Mr. Frederick James | male | 28.0 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | NaN | S | died |
4 | 885 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S | died |
5 | 886 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q | died |
6 | 887 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S | died |
7 | 888 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S | survived |
8 | 889 | 3 | Johnston, Miss. Catherine Helen Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S | survived |
9 | 890 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C | died |
10 | 891 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q | died |
While feature engineering, it is desired to only view the features that contributed to the prediction.
With the -trim
parameter, the output will only show the features deemed important by the model.
!python3 predictor_104.py titanic_predict.csv -trim > trimmed_classifications_104.csv
print('Viewing important features classification predictions.')
# preview uses pandas to read and display csv data
import pandas as pd
pd.read_csv('trimmed_classifications_104.csv')
Viewing important features classification predictions.
PassengerId | Cabin_Class | Name | Sex | Age | Sibling_Spouse | Parent_Children | Ticket_Number | Fare | Cabin_Number | Port_of_Embarkation | Prediction | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 881 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | NaN | S | survived |
1 | 882 | 3 | Markun, Mr. Johann | male | 33.0 | 0 | 0 | 349257 | 7.8958 | NaN | S | died |
2 | 883 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | NaN | S | died |
3 | 884 | 2 | Banfield, Mr. Frederick James | male | 28.0 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | NaN | S | died |
4 | 885 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S | died |
5 | 886 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q | died |
6 | 887 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S | died |
7 | 888 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S | survived |
8 | 889 | 3 | Johnston, Miss. Catherine Helen Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S | survived |
9 | 890 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C | died |
10 | 891 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q | died |
See notebook 300 Put your model to work for integrating the predictor within your python program.