k-means algo searches for a pre-determined number of clusters
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df = pd.read_csv('my_machine-learning/customers.csv')
X = df.iloc[:, [3, 4]].values
df.head()
CustomerID | Genre | Age | Annual Income (k$) | Spending Score (1-100) | |
---|---|---|---|---|---|
0 | 1 | Male | 19 | 15 | 39 |
1 | 2 | Male | 21 | 15 | 81 |
2 | 3 | Female | 20 | 16 | 6 |
3 | 4 | Female | 23 | 16 | 77 |
4 | 5 | Female | 31 | 17 | 40 |
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.show()
WCSS = []
for i in range(1, 21):
clf = KMeans(n_clusters=i)
clf.fit(X)
WCSS.append(clf.inertia_) # inertia is another name for WCSS
plt.plot(range(1, 21), WCSS)
plt.title('The Elbow Method')
plt.grid()
plt.show()
Elbow is at 5
kmean = KMeans(n_clusters=5,)
y_kmeans = kmean.fit_predict(X)
fig = plt.figure(figsize=(12, 9))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmean.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=150, alpha=0.5)
plt.title('Clusters using KMeans')
plt.ylabel('Annual Income (k$)')
plt.xlabel('Spending Score (1-100)')
plt.show()