In this Case Study, we'll look at how a Decision Tree classifier deals with different shapes of data and the kind of decision regions it makes in 2-D.
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
Lets look at the blobs data. It is linearly separable, meaning the two classes can be seperated with just a line!
angle_train = pd.read_csv("angle_train.csv")
test_data = pd.read_csv("test_data.csv")
# plot the train data.
angle_train.plot.scatter(x="feat1",y="feat2",c="label", cmap="viridis", colorbar=False, figsize=(12,8));
# train the classifier
mx_depth = 1
clf = DecisionTreeClassifier(random_state=1, max_depth=mx_depth)
clf.fit(angle_train[["feat1", "feat2"]], angle_train["label"])
# predict the label on test data
test_data["pred"] = clf.predict(test_data[["feat1", "feat2"]])
# plot both of the data in one plot
# We are making the test point 40% transparent using alpha = 0.4
# Dark points are training points
ax1 = angle_train.plot.scatter(x="feat1",y="feat2",c="label", cmap="viridis", colorbar=False, figsize=(12,8));
test_data.plot.scatter(x="feat1",y="feat2",c="pred", cmap="viridis", colorbar=False, figsize=(12,8), ax=ax1, alpha=0.4);
print("Using max_depth =",mx_depth)
Using max_depth = 1