🤖⚡ scikit-learn tip #46 (video)¶

Want to improve your classifier's accuracy? Create multiple models and ensemble them using VotingClassifier!

See example 👇

P.S. VotingRegressor is also available

In [1]:

import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain')

In [2]:

cols = ['Pclass', 'Parch', 'SibSp', 'Fare']
X = df[cols]
y = df['Survived']

In [3]:

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.model_selection import cross_val_score

In [4]:

lr = LogisticRegression(solver='liblinear', random_state=1)
cross_val_score(lr, X, y).mean()

Out[4]:

0.6835791852363318

In [5]:

rf = RandomForestClassifier(max_features=None, random_state=1)
cross_val_score(rf, X, y).mean()

Out[5]:

0.6947774778733288

In [6]:

# create an ensemble for improved accuracy
vc = VotingClassifier([('clf1', lr), ('clf2', rf)], voting='soft')
cross_val_score(vc, X, y).mean()

Out[6]:

0.7251020023852865

🤖⚡ scikit-learn tip #46 (video)¶

Want more tips? View all tips on GitHub or Sign up to receive 2 tips by email every week 💌¶