Activity 9 - Recommender Systems¶

This notebook illustrates the recommender system example used in the lecture notes for Text Analytics.

In [1]:

import pandas as pd
import numpy as np

In [3]:

def rand_array():
    return list(np.round(np.random.random([10,])))
rand_array()

Out[3]:

[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0]

In [4]:

data = {}
data['Items'] = ['Apple','Banana','Pear','Chicken','Beef','Lamb','Pizza','Pasta','Rice','Cake']
data['Alice']= rand_array()
data['Bob']= rand_array()
data['Charlie']= rand_array()
data['Daisy']= rand_array()
data['Edward']= rand_array()
data['Faye']= rand_array()
data['George']= rand_array()
data['Harriet']= rand_array()
data['Imogen']= rand_array()
data['John']= rand_array()
df = pd.DataFrame(data)
df

Out[4]:

	Items	Alice	Bob	Charlie	Daisy	Edward	Faye	George	Harriet	Imogen	John
0	Apple	1.0	1.0	0.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0
1	Banana	1.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0
2	Pear	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0
3	Chicken	0.0	1.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0
4	Beef	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	1.0	0.0
5	Lamb	0.0	0.0	1.0	0.0	1.0	1.0	0.0	0.0	0.0	1.0
6	Pizza	1.0	1.0	0.0	0.0	0.0	0.0	1.0	1.0	0.0	1.0
7	Pasta	1.0	1.0	0.0	0.0	1.0	1.0	1.0	0.0	0.0	1.0
8	Rice	1.0	1.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0
9	Cake	1.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0

Above we have a table, where we can see whether each person likes a particular item.

If the item is scored 1.0 then the person likes the item.
If the item is scored 0.0 then the person does not like the item.

In [5]:

df['Kyle'] = df.values[:,1:11].sum(axis=1) / 10
#df = df.drop(['Lemmy'], axis=1)
df

Out[5]:

	Items	Alice	Bob	Charlie	Daisy	Edward	Faye	George	Harriet	Imogen	John	Kyle
0	Apple	1.0	1.0	0.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	0.7
1	Banana	1.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.4
2	Pear	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	0.2
3	Chicken	0.0	1.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.4
4	Beef	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	1.0	0.0	0.4
5	Lamb	0.0	0.0	1.0	0.0	1.0	1.0	0.0	0.0	0.0	1.0	0.4
6	Pizza	1.0	1.0	0.0	0.0	0.0	0.0	1.0	1.0	0.0	1.0	0.5
7	Pasta	1.0	1.0	0.0	0.0	1.0	1.0	1.0	0.0	0.0	1.0	0.6
8	Rice	1.0	1.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	0.5
9	Cake	1.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	0.8

Above we observe a new person - Kyle. At this stage we do not know anything about Kyle.

We can generate a initial profile for this new user by taking the average popularity of each item for each user from our current knowledge base.

In [11]:

#df.style.set_properties(**{'background-color': '#7f3fee'}, subset=['Kyle'])
df = df.drop(['Kyle'], axis=1)
df['Kyle'] = [0,0,0,0,1,0,0,0,0,0]
df

Out[11]:

	Items	Alice	Bob	Charlie	Daisy	Edward	Faye	George	Harriet	Imogen	John	Kyle
0	Apple	1.0	1.0	0.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	0
1	Banana	1.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0
2	Pear	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	0
3	Chicken	0.0	1.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0
4	Beef	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	1.0	0.0	1
5	Lamb	0.0	0.0	1.0	0.0	1.0	1.0	0.0	0.0	0.0	1.0	0
6	Pizza	1.0	1.0	0.0	0.0	0.0	0.0	1.0	1.0	0.0	1.0	0
7	Pasta	1.0	1.0	0.0	0.0	1.0	1.0	1.0	0.0	0.0	1.0	0
8	Rice	1.0	1.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	0
9	Cake	1.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	0

Suppose we now know that Kyle likes Beef. We could initialise his profile with this information.

In [14]:

df['Kyle'].argmax()
p = df.iloc[df['Kyle'].argmax(),:]
p = p[p==1]
v = len(p)
p = df[p.index]
p

Out[14]:

	Alice	Daisy	Harriet	Imogen	Kyle
0	1.0	1.0	0.0	1.0	0
1	1.0	0.0	0.0	0.0	0
2	0.0	1.0	0.0	1.0	0
3	0.0	0.0	0.0	0.0	0
4	1.0	1.0	1.0	1.0	1
5	0.0	0.0	0.0	0.0	0
6	1.0	0.0	1.0	0.0	0
7	1.0	0.0	0.0	0.0	0
8	1.0	0.0	1.0	0.0	0
9	1.0	0.0	1.0	1.0	0

The above shows us who else also likes Beef, allowing us to see who is more similar to Kyle.

In [121]:

p = p.drop(['Kyle'], axis=1)
p['Kyle'] = p.values[:,:].sum(axis=1) / (v-1)
p["Items"] = df['Items']
p

Out[121]:

	Alice	Edward	Harriet	John	Kyle	Items
0	0.0	0.0	0.0	0.0	0.00	Apple
1	1.0	0.0	0.0	0.0	0.25	Banana
2	0.0	1.0	1.0	1.0	0.75	Pear
3	0.0	0.0	0.0	0.0	0.00	Chicken
4	1.0	1.0	1.0	1.0	1.00	Beef
5	1.0	0.0	0.0	1.0	0.50	Lamb
6	1.0	0.0	0.0	1.0	0.50	Pizza
7	1.0	1.0	1.0	1.0	1.00	Pasta
8	1.0	0.0	0.0	1.0	0.50	Rice
9	1.0	0.0	0.0	1.0	0.50	Cake

Since we know for sure that Kyle likes Beef, we can use this subset of users to get a more precise initialisation of Kyle's preferences. As a simple observation, previously we were only 0.6 as to whether Kyle liked Pasta, when we had a "cold" initialisation of his profile. Knowing that he likes Beef, we are now very confident that he likes Pasta, since everyone in our dataset who likes Beef also likes Pasta, and hence his probability for this now increases to 1.0.

In [ ]:

	Alice	Daisy	Harriet	Imogen	Kyle
0	1.0	1.0	0.0	1.0	0
1	1.0	0.0	0.0	0.0	0
2	0.0	1.0	0.0	1.0	0
3	0.0	0.0	0.0	0.0	0
4	1.0	1.0	1.0	1.0	1
5	0.0	0.0	0.0	0.0	0
6	1.0	0.0	1.0	0.0	0
7	1.0	0.0	0.0	0.0	0
8	1.0	0.0	1.0	0.0	0
9	1.0	0.0	1.0	1.0	0

	Alice	Daisy	Harriet	Imogen	Kyle
0	1.0	1.0	0.0	1.0	0
1	1.0	0.0	0.0	0.0	0
2	0.0	1.0	0.0	1.0	0
3	0.0	0.0	0.0	0.0	0
4	1.0	1.0	1.0	1.0	1
5	0.0	0.0	0.0	0.0	0
6	1.0	0.0	1.0	0.0	0
7	1.0	0.0	0.0	0.0	0
8	1.0	0.0	1.0	0.0	0
9	1.0	0.0	1.0	1.0	0

	Alice	Daisy	Harriet	Imogen	Kyle
0	1.0	1.0	0.0	1.0	0
1	1.0	0.0	0.0	0.0	0
2	0.0	1.0	0.0	1.0	0
3	0.0	0.0	0.0	0.0	0
4	1.0	1.0	1.0	1.0	1
5	0.0	0.0	0.0	0.0	0
6	1.0	0.0	1.0	0.0	0
7	1.0	0.0	0.0	0.0	0
8	1.0	0.0	1.0	0.0	0
9	1.0	0.0	1.0	1.0	0