IsoTree to TreeLite

This is a short example of converting an Isolation Forest model generated through the isotree library to treelite format, which can be used to compile these trees to a standalone runtime library which is oftentimes faster ar making predictions.


Getting some medium-size data from scikit-learn to fit a model

In [1]:
import numpy as np
from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True)
print(X.shape)
(20640, 8)

Fitting an isolation forest model through isotree

Note: only models that use ndim=1 can be exported to treelite format.

In [2]:
from isotree import IsolationForest

iso = IsolationForest(ndim=1, ntrees=100, sample_size=256,
                      missing_action="impute", max_depth=8)
iso.fit(X)

### Now convert
treelite_model = iso.to_treelite()

### OPTIONAL: add annotations for better branch prediction
import treelite, treelite_runtime
annotator = treelite.Annotator()
annotator.annotate_branch(
    model=treelite_model,
    dmat=treelite_runtime.DMatrix(X),
    verbose=False
)
annotator.save(path="iso_branches_annotation.json")

Compiling the treelite model

These models need to be compiled into a shared library in order to be used:

In [3]:
%%capture
import treelite_runtime
import multiprocessing

treelite_model.compile(
    dirpath='.',
    params={
        "parallel_comp":multiprocessing.cpu_count(),
        "annotate_in": "iso_branches_annotation.json"
    }
)
treelite_model.export_lib("clang", ".")
treelite_predictor = treelite_runtime.Predictor("predictor.so")

Now verify that they make the same predictions:

In [4]:
iso.predict(X[:10])
Out[4]:
array([0.47006444, 0.47770081, 0.4910637 , 0.42605826, 0.41548625,
       0.41730139, 0.41699421, 0.43228664, 0.40877799, 0.41800632])
In [5]:
treelite_predictor.predict(treelite_runtime.DMatrix(X[:10]))
Out[5]:
array([0.47006445, 0.47770081, 0.4910637 , 0.42605827, 0.41548626,
       0.41730139, 0.41699421, 0.43228664, 0.40877799, 0.41800632])

Note: some small disagreement between the two is expected due to loss of precision when converting. See the documentation in isotree for more details.

Comparing prediction times

In [6]:
%%timeit
import multiprocessing
### see docs for 'IsolationForest.predict' about this part
iso.set_params(nthreads=multiprocessing.cpu_count()/2)
iso.predict(X)
31.6 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]:
%%timeit
treelite_predictor.predict(treelite_runtime.DMatrix(X))
4.41 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)