This is a short example of converting an Isolation Forest model generated through the isotree library to treelite format, which can be used to compile these trees to a standalone runtime library which is oftentimes faster ar making predictions.
import numpy as np
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
print(X.shape)
Note: only models that use ndim=1
can be exported to treelite
format.
from isotree import IsolationForest
iso = IsolationForest(ndim=1, ntrees=100, sample_size=256,
missing_action="impute", max_depth=8)
iso.fit(X)
### Now convert
treelite_model = iso.to_treelite()
### OPTIONAL: add annotations for better branch prediction
import treelite, treelite_runtime
annotator = treelite.Annotator()
annotator.annotate_branch(
model=treelite_model,
dmat=treelite_runtime.DMatrix(X),
verbose=False
)
annotator.save(path="iso_branches_annotation.json")
These models need to be compiled into a shared library in order to be used:
%%capture
import treelite_runtime
import multiprocessing
treelite_model.compile(
dirpath='.',
params={
"parallel_comp":multiprocessing.cpu_count(),
"annotate_in": "iso_branches_annotation.json"
}
)
treelite_model.export_lib("clang", ".")
treelite_predictor = treelite_runtime.Predictor("predictor.so")
Now verify that they make the same predictions:
iso.predict(X[:10])
treelite_predictor.predict(treelite_runtime.DMatrix(X[:10]))
Note: some small disagreement between the two is expected due to loss of precision when converting. See the documentation in isotree
for more details.
%%timeit
import multiprocessing
### see docs for 'IsolationForest.predict' about this part
iso.set_params(nthreads=multiprocessing.cpu_count()/2)
iso.predict(X)
%%timeit
treelite_predictor.predict(treelite_runtime.DMatrix(X))