301 Using Predictors in Python¶

Brainome's predictors can be easily integrated into your python application.

Importing the predictor
Validating a data set
Batch Classification (DataFrame)
Realtime Classification (Single instance)

Prerequisites¶

This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start

The data sets are:

titanic_train.csv for training data
titanic_validate.csv for validation
titanic_predict.csv for predictions

Predictors require numpy and optionally scipy to generate a confusion matrix.

In [ ]:

!python3 -m pip install brainome --quiet
!brainome --version
# download data sets
import urllib.request as request
response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_validate.csv titanic_predict.csv

Generate a predictor¶

The predictor filename is predictor_301.py

In [ ]:

!brainome https://download.brainome.ai/data/public/titanic_train.csv -rank -y -o predictor_301.py -modelonly -q
print('\nCreated predictor_301.py')
!ls -lh predictor_301.py

1. Importing the predictor¶

Start with importing the predictor_301.py source file into your program. It also requires numpy. Calling help(predictor) will display the pydoc for it.

In [ ]:

import numpy as np      # predictors require numpy
import predictor_301 as predictor
help(predictor)

2. Validating a data set¶

Given a test data set, the predictor will compare predictions with expected outcomes.

For this exercise, we are reading the data set into a pandas data frame, your method may differ.

In [ ]:

%pip install pandas --quiet
import pandas as pd

validate_data = pd.read_csv('titanic_validate.csv', na_values=[], na_filter=False)
validate_values = validate_data.values
clean_values = predictor.__preprocess_and_clean_in_memory(validate_values)
count, correct_count, num_TP, num_TN, num_FP, num_FN, num_class_1, num_class_0, preds = predictor.validate(clean_values)
print(' Test Predictions '.center(80, '-'))
print(preds)
true_labels = clean_values[:, -1]
mtrx, stats = predictor.__confusion_matrix(np.array(true_labels).reshape(-1), np.array(preds).reshape(-1), True)
print(' Confusion Matrix '.center(80, '-'))
print(mtrx)
print(' Statistics '.center(80, '-'))
print(stats)

3. Single Instance Classification¶

Demonstrating classification of a single passenger

In [ ]:

passenger = [881,2,"Shelley, Mrs. William (Imanita Parrish Hall)","female",25,0,1,230433,26,"","S"]
prediction = predictor.predict([passenger])[0]
print(passenger[2], prediction)

3. Batch Classification¶

Given a chunk of data, the predictor will return classification predictions for each record.

In [ ]:

predict_data = pd.read_csv('titanic_predict.csv', na_values=[], na_filter=False)
predict_values = predict_data.values
predictions_output = predictor.predict(predict_values)
print(' Batch Predictions '.center(80, '-'))
print(predictions_output)

4. Large Data Set Classification¶

Not all data sets can be fully loaded into memory but rather must be streamed instance by instance.

In [ ]:

import csv
with open("./titanic_predict.csv", "r") as csv_file:
    data_reader = csv.reader(csv_file)
    header = next(data_reader)
    first = True
    for row in data_reader:
        prediction = predictor.predict([row])[0]
        probability = predictor.predict([row], return_probabilities=True)
        if first:
            first = False
            print(header[0], 'Prediction', "\t", probability[0])
        print(row[0], prediction, "\t", probability[1])

Next Steps¶

Check out 302_Generating_Probabilities
Check out 300 Put your model to work