This notebook illustrates the recommender system example used in the lecture notes for Text Analytics.
import pandas as pd
import numpy as np
def rand_array():
return list(np.round(np.random.random([10,])))
rand_array()
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0]
data = {}
data['Items'] = ['Apple','Banana','Pear','Chicken','Beef','Lamb','Pizza','Pasta','Rice','Cake']
data['Alice']= rand_array()
data['Bob']= rand_array()
data['Charlie']= rand_array()
data['Daisy']= rand_array()
data['Edward']= rand_array()
data['Faye']= rand_array()
data['George']= rand_array()
data['Harriet']= rand_array()
data['Imogen']= rand_array()
data['John']= rand_array()
df = pd.DataFrame(data)
df
Items | Alice | Bob | Charlie | Daisy | Edward | Faye | George | Harriet | Imogen | John | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Apple | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 |
1 | Banana | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
2 | Pear | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3 | Chicken | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
4 | Beef | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 |
5 | Lamb | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
6 | Pizza | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 |
7 | Pasta | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 |
8 | Rice | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
9 | Cake | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 |
Above we have a table, where we can see whether each person likes a particular item.
df['Kyle'] = df.values[:,1:11].sum(axis=1) / 10
#df = df.drop(['Lemmy'], axis=1)
df
Items | Alice | Bob | Charlie | Daisy | Edward | Faye | George | Harriet | Imogen | John | Kyle | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Apple | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.7 |
1 | Banana | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.4 |
2 | Pear | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.2 |
3 | Chicken | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.4 |
4 | Beef | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.4 |
5 | Lamb | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.4 |
6 | Pizza | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.5 |
7 | Pasta | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.6 |
8 | Rice | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.5 |
9 | Cake | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.8 |
Above we observe a new person - Kyle. At this stage we do not know anything about Kyle.
#df.style.set_properties(**{'background-color': '#7f3fee'}, subset=['Kyle'])
df = df.drop(['Kyle'], axis=1)
df['Kyle'] = [0,0,0,0,1,0,0,0,0,0]
df
Items | Alice | Bob | Charlie | Daisy | Edward | Faye | George | Harriet | Imogen | John | Kyle | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Apple | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0 |
1 | Banana | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0 |
2 | Pear | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0 |
3 | Chicken | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0 |
4 | Beef | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1 |
5 | Lamb | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0 |
6 | Pizza | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0 |
7 | Pasta | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0 |
8 | Rice | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0 |
9 | Cake | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0 |
Suppose we now know that Kyle likes Beef. We could initialise his profile with this information.
df['Kyle'].argmax()
p = df.iloc[df['Kyle'].argmax(),:]
p = p[p==1]
v = len(p)
p = df[p.index]
p
Alice | Daisy | Harriet | Imogen | Kyle | |
---|---|---|---|---|---|
0 | 1.0 | 1.0 | 0.0 | 1.0 | 0 |
1 | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
2 | 0.0 | 1.0 | 0.0 | 1.0 | 0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0 |
4 | 1.0 | 1.0 | 1.0 | 1.0 | 1 |
5 | 0.0 | 0.0 | 0.0 | 0.0 | 0 |
6 | 1.0 | 0.0 | 1.0 | 0.0 | 0 |
7 | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
8 | 1.0 | 0.0 | 1.0 | 0.0 | 0 |
9 | 1.0 | 0.0 | 1.0 | 1.0 | 0 |
The above shows us who else also likes Beef, allowing us to see who is more similar to Kyle.
p = p.drop(['Kyle'], axis=1)
p['Kyle'] = p.values[:,:].sum(axis=1) / (v-1)
p["Items"] = df['Items']
p
Alice | Edward | Harriet | John | Kyle | Items | |
---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | Apple |
1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.25 | Banana |
2 | 0.0 | 1.0 | 1.0 | 1.0 | 0.75 | Pear |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | Chicken |
4 | 1.0 | 1.0 | 1.0 | 1.0 | 1.00 | Beef |
5 | 1.0 | 0.0 | 0.0 | 1.0 | 0.50 | Lamb |
6 | 1.0 | 0.0 | 0.0 | 1.0 | 0.50 | Pizza |
7 | 1.0 | 1.0 | 1.0 | 1.0 | 1.00 | Pasta |
8 | 1.0 | 0.0 | 0.0 | 1.0 | 0.50 | Rice |
9 | 1.0 | 0.0 | 0.0 | 1.0 | 0.50 | Cake |
Since we know for sure that Kyle likes Beef, we can use this subset of users to get a more precise initialisation of Kyle's preferences. As a simple observation, previously we were only 0.6 as to whether Kyle liked Pasta, when we had a "cold" initialisation of his profile. Knowing that he likes Beef, we are now very confident that he likes Pasta, since everyone in our dataset who likes Beef also likes Pasta, and hence his probability for this now increases to 1.0.