ESF projekt Západočeské univerzity v Plzni reg. č. CZ.02.2.69/0.0/0.0/16 015/0002287

Image Classification - clasical approaches

Basic types of classifiers:

  • K-means
  • k-Nearest Neighbour
  • Bayesian classifier
  • Support Vector Machine

Types of learning:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning
In [1]:
# scikit-learn 
%pylab inline --no-import-all
from sklearn import datasets
import numpy as np
import sklearn.model_selection 
Populating the interactive namespace from numpy and matplotlib

Iris dataset

  • Načtení trénovacích dat. Jde o kosatec (iris flower) a jeho tři poddruhy: Iris setosa, Iris versicolor, Iris virginica. Měří se délka kalichu, šířka kalichu, délka okvětního lístku a šířka okvětního lístku.
In [2]:
iris = datasets.load_iris()
# cílové třídy
# rozměry dat
print("data ", iris.data.shape)
print(iris.data[-10:,:])

print("")
print("target", iris.target.shape)
print(np.unique(iris.target))
print(iris.target[-10:])
data  (150, 4)
[[6.7 3.1 5.6 2.4]
 [6.9 3.1 5.1 2.3]
 [5.8 2.7 5.1 1.9]
 [6.8 3.2 5.9 2.3]
 [6.7 3.3 5.7 2.5]
 [6.7 3.  5.2 2.3]
 [6.3 2.5 5.  1.9]
 [6.5 3.  5.2 2. ]
 [6.2 3.4 5.4 2.3]
 [5.9 3.  5.1 1.8]]

target (150,)
[0 1 2]
[2 2 2 2 2 2 2 2 2 2]

k-Nearest Neighbour classifier

In [3]:
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier()
knn.fit(iris.data, iris.target) 
#KNeighborsClassifier(...)
predikce = knn.predict([[0.1, 0.2, 0.3, 0.4]])
print(predikce)
#array([0])
[0]
In [4]:
perm = np.random.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

train_data = iris.data[:100]
train_target = iris.target[:100]

test_data = iris.data[100:]
test_target = iris.target[100:]

knn.fit(train_data, train_target) 

knn.score(test_data, test_target) 
Out[4]:
0.94

Bayesian classifier

In [5]:
import sklearn.naive_bayes
gnb = sklearn.naive_bayes.GaussianNB()
gnb.fit(train_data, train_target)
y_pred = gnb.predict(test_data)
print("Number of mislabeled points : %d" % (test_target != y_pred).sum())
Number of mislabeled points : 5

SVM classifier

In [6]:
from sklearn import svm
svc = svm.SVC()
svc.fit(train_data, train_target) 
y_pred = svc.predict(test_data)
print("Number of mislabeled points : %d" % (test_target != y_pred).sum())
Number of mislabeled points : 3

Training data

Testing data

In [8]:
import scipy
import urllib
import skimage
import skimage.color
import skimage.measure
import skimage.io
from sklearn import svm


# URL = "http://uc452cam01-kky.fav.zcu.cz/snapshot.jpg"
URL = "https://raw.githubusercontent.com/mjirik/ZDO/master/objekty/ctverce_hvezdy_kolecka.jpg"
img = skimage.io.imread(URL, as_gray=True)
plt.imshow(img)
# doporučený klasifikátor ...
# pozor na labeling a "+1 problém"
Out[8]:
<matplotlib.image.AxesImage at 0x208dcbaa208>

Titanic