This IPython notebook illustrates the usage of the cmfrec Python package for collective matrix factorization using the MovieLens-100k data, consisting of ratings from users about movies + user demographic information + movie genres.
Collective matrix factorization is a technique for collaborative filtering with additional information about the users and items, based on low-rank joint factorization of different matrices with shared factors – for more details see the paper Singh, A. P., & Gordon, G. J. (2008, August). Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 650-658). ACM..
** Small note: if the TOC here is not clickable or the math symbols don't show properly, try visualizing this same notebook from nbviewer following this link. **
1. Basic model - only movie ratings
3. Adding movie genres and user demographic info
As a starting point, I'll first try the basic low-rank factorization model using ratings data alone - that is, trying to minimize the following function:
$$ Loss\:(U, V) = \lVert X - UV^T\lVert^2\: + \:\lambda\: (\lVert U \lVert^2 + \lVert V \lVert^2) $$Where $U$ and $V$ are lower-dimensional matrices mapping users and items into a latent space - this is the classic model popularized by Funk. The predicted rating from this model for a given user $i$ and movie $j$ can be calculated as $U[i,:]*V[j,:]^T$
Here I'll load the MovieLens-100k ratings data, which can be downloaded from the link presented at the beginning:
import pandas as pd, time
from datetime import datetime
ratings=pd.read_table('D:\\Downloads\\movielens\\ml-100k\\ml-100k\\u.data',sep='\t',engine='python',names=['UserId','ItemId','Rating','Timestamp'])
ratings['Timestamp']=ratings.Timestamp.map(lambda x: datetime(*time.localtime(x)[:6])).map(lambda x: pd.to_datetime(x))
ratings=ratings.sort_values(['UserId','ItemId']).reset_index(drop=True)
ratings.head()
UserId | ItemId | Rating | Timestamp | |
---|---|---|---|---|
0 | 1 | 1 | 5 | 1997-09-23 01:02:38 |
1 | 1 | 2 | 3 | 1997-10-15 08:26:11 |
2 | 1 | 3 | 4 | 1997-11-03 09:42:40 |
3 | 1 | 4 | 3 | 1997-10-15 08:25:19 |
4 | 1 | 5 | 3 | 1998-03-13 03:15:12 |
In order to evaluate the model, I'll create a train and test set split to use throughout the whole notebook. As this kind of model can only recommend items that were in the training set to users who also were in the training set, I'll make the test set contain only elements that were present in the train set.
In order to make this more realistic, I'll make it as a temporal split, i.e. splitting the ratings as those who were submitted before and after a certain time cutoff.
time_cutoff='1998-01-01'
train=ratings.loc[ratings.Timestamp<=time_cutoff]
test=ratings.loc[ratings.Timestamp>time_cutoff]
users_train=set(list(train.UserId))
items_train=set(list(train.ItemId))
test=test.loc[test.UserId.map(lambda x: x in users_train)]
test=test.loc[test.ItemId.map(lambda x: x in items_train)]
print(train.shape)
print(test.shape)
(52884, 4) (5835, 4)
Note that this is a very small sample, in a typical setting you would have 3 or 4 orders of magnitude more. Nevertheless, this smallish data is enough to see a difference between models.
print(len(users_train))
print(len(items_train))
529 1493
Traditionally, recommendations have been evaluated by their cross-validated RMSE (root mean squared error), but this is not really a good metric and higher values might not translate into better-liked recommendations. There are many additional metrics that can be used, but to keep this example simple, I’ll evaluate the rating that users would have given to the Top-5 recommendations from this model and compare this to recommendations by item popularity and to random recommendations.
from cmfrec import CMF
import numpy as np
# Number of latent factors
k=40
# Regularization parameter
reg=10
# Fitting the model
rec=CMF(k=k, reg_param=reg)
rec.fit(train, random_seed=12345)
# Making predictions
test['Predicted']=test.apply(lambda x: rec.predict(x['UserId'],x['ItemId']),axis=1)
****************************************************************************** This program contains Ipopt, a library for large-scale nonlinear optimization. Ipopt is released as open source code under the Eclipse Public License (EPL). For more information visit http://projects.coin-or.org/Ipopt ******************************************************************************
Evaluating Hold-Out RMSE (the hyperparameters had already been somewhat tuned by cross-validation)
np.sqrt(np.mean((test.Predicted-test.Rating)**2))
1.2647716220817762
Basic evaluation of this model:
avg_ratings=train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating":"AvgRating"})
test2=pd.merge(test,avg_ratings,left_on='ItemId',right_index=True,how='left')
print('Averge movie rating:',test2.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for top-5 recommendations of best-rated movies:',test2.sort_values(['UserId','AvgRating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('----------------------')
print('Average rating for top-5 recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 (non-)recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
Averge movie rating: 3.5602718818211856 Average rating for top-5 rated by each user: 4.5298621745788665 Average rating for bottom-5 rated by each user: 2.246554364471669 Average rating for top-5 recommendations of best-rated movies: 4.029096477794793 ---------------------- Average rating for top-5 recommendations from this model: 4.016845329249617 Average rating for bottom-5 (non-)recommendations from this model: 3.116385911179173
The recommendations from this model are not bad, but the average rating of the Top-5 doesn't manage to beat a most-popular recommendation! This is not surprising given the small size of the ratings data though.
The previous model can be extended by adding some additional information about the movies - this can be done by also factorizing the movie-genre matrix and sharing the item-factor matrix in the factorization of the user-item ratings. Now the model becomes:
$$ Loss\:(U, V, Z) = \lVert X - UV^T\lVert^2\: + \:\lVert M-VZ^T \lVert^2\: + \:\lambda\: (\lVert U \lVert^2 + \lVert V \lVert^2 + \lVert Z \lVert^2) $$Where $U$, $V$ and $Z$ are lower-dimensional matrices mapping users, items and genres into a latent space. The predicted rating from this model for a given user $i$ and movie $j$ is still calculated the same as before: $U[i,:]*V[j,:]^T$. However, we can intuitively think that an item-factor matrix that also represents genres might be better than one that does not, and less likely to overfit, as these factors are not so free.
The matrix $V$ however doesn't need to be exactly the same in both terms - we can also add some additional factors that appear in only one factorization, making the follwing formula:
$$ Loss\:(U, V, Z) = \lVert X - UV_{main}^T\lVert^2\: + \:\lVert M-V_{sec}Z^T \lVert^2\: + \:\lambda\: (\lVert U \lVert^2 + \lVert V \lVert^2 + \lVert Z \lVert^2) $$Where $ V_{main} = V_{[1\:to\:k_{main} + k_{shared} ,\:\cdot]}$ and $V_{sec} = V_{[k_{main} +1 \:to\: k_{main} + k_{shared} + k_{sec},\:\cdot]}$
The MovieLens-100k data also comes with a file containing movie information that we can use to enhance the model - note that the package requires the item side information to have a column named ItemId when you pass it to the API. If your data doesn't require any reindexing, you can also pass it as a numpy array and set the option reindex to False.
colnames=['ItemId','Title','ReleaseDate','Sep','Link']+['genre'+str(i) for i in range(19)]
genres=pd.read_table('D:\\Downloads\\movielens\\ml-100k\\ml-100k\\u.item',sep="|",engine='python',names=colnames)
# will save the movie titles for later
movie_id_to_title={i.ItemId:i.Title for i in genres.itertuples()}
genres=genres[['ItemId']+['genre'+str(i) for i in range(19)]]
genres.head()
ItemId | genre0 | genre1 | genre2 | genre3 | genre4 | genre5 | genre6 | genre7 | genre8 | genre9 | genre10 | genre11 | genre12 | genre13 | genre14 | genre15 | genre16 | genre17 | genre18 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
3 | 4 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
These hypterparameters (number of factors and regularization) were also somewhat tuned beforehand:
# Number of latent factors
k=30
k_main=10
k_sec=10
# Regularization parameter
reg=10
# Fitting the model
rec2=CMF(k=k, k_main=k_main, k_item=k_sec, reg_param=reg)
rec2.fit(train, genres, random_seed=10000)
# Making predictions
test['Predicted']=test.apply(lambda x: rec2.predict(x['UserId'],x['ItemId']),axis=1)
RMSE now:
np.sqrt(np.mean((test.Predicted-test.Rating)**2))
1.2610262136540786
Same evaluation as before:
test2=pd.merge(test,avg_ratings,left_on='ItemId',right_index=True,how='left')
print('Averge movie rating:',test2.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for top-5 recommendations of best-rated movies:',test2.sort_values(['UserId','AvgRating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('----------------------')
print('Average rating for top-5 recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 (non-)recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
Averge movie rating: 3.5602718818211856 Average rating for top-5 rated by each user: 4.5298621745788665 Average rating for bottom-5 rated by each user: 2.246554364471669 Average rating for top-5 recommendations of best-rated movies: 4.029096477794793 ---------------------- Average rating for top-5 recommendations from this model: 4.03062787136294 Average rating for bottom-5 (non-)recommendations from this model: 3.113323124042879
Now we see a bit of an improvement - it's not too large, but it's nevertheless an improvement, and this time these personalized recommendations get overall higher ratings than most-popular recommendations with as little as 50k ratings.
Knowing these generic genres shouldn't be a complete game changer so this is expected.
The previous model can be extended to incorporate user information in the same way as it added movie genres:
$$ Loss\:(U, V, Z, P) = \lVert X - UV^T\lVert^2\: + \:\lVert M-VZ^T \lVert^2\: + \:\lVert Q-UP^T \lVert^2\: + \:\lambda\: (\lVert U \lVert^2 + \lVert V \lVert^2 + \lVert Z \lVert^2 + \lVert P \lVert^2) $$Where $Q$ is the user attribute matrix and $P$ is the new attribute-factor matrix - same as before, some of the factors can be shared and some be specific to one factorization.
Intuitively, since in a typical setting there are usually more users than items (not in this particular example though), and each user has on average fewer rated movies than movies have users rating them, it would be logical to assume that detailed user information should be more valuable than detailed item information.
The MovieLens-100k data also comes with user demographic information - same as before, the data frame passed to the package API should have a column named UserId:
user_info=pd.read_table('D:\\Downloads\\movielens\\ml-100k\\ml-100k\\u.user',sep="|",engine='python',
names=['UserId','Age','Gender','Occupation','Zipcode'])
user_info.head()
UserId | Age | Gender | Occupation | Zipcode | |
---|---|---|---|---|---|
0 | 1 | 24 | M | technician | 85711 |
1 | 2 | 53 | F | other | 94043 |
2 | 3 | 23 | M | writer | 32067 |
3 | 4 | 24 | M | technician | 43537 |
4 | 5 | 33 | F | other | 15213 |
This time, unfortunately, not all the information can be used as it is in the file. The zip code can still provide valuable information if we can link it to a broader geographical area. As these are mostly US users, I'll try to link it to US regions here.
In order to do so, I’m using a publicly available table mapping zip codes to states, another one mapping state names to their abbreviations, and finally classifying the states into regions according to usual definitions.
import re
zipcode_abbs=pd.read_csv("D:\\Downloads\\movielens\\zips\\states.csv")
zipcode_abbs_dct={z.State:z.Abbreviation for z in zipcode_abbs.itertuples()}
us_regs_table=[
('New England', 'Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont'),
('Middle Atlantic', 'Delaware, Maryland, New Jersey, New York, Pennsylvania'),
('South', 'Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Virginia, West Virginia'),
('Midwest', 'Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin'),
('Southwest', 'Arizona, New Mexico, Oklahoma, Texas'),
('West', 'Alaska, California, Colorado, Hawaii, Idaho, Montana, Nevada, Oregon, Utah, Washington, Wyoming')
]
us_regs_table=[(x[0],[i.strip() for i in x[1].split(",")]) for x in us_regs_table]
us_regs_dct=dict()
for r in us_regs_table:
for s in r[1]:
us_regs_dct[zipcode_abbs_dct[s]]=r[0]
zipcode_info=pd.read_csv("D:\\Downloads\\movielens\\free-zipcode-database.csv")
zipcode_info=zipcode_info.groupby('Zipcode').first().reset_index()
zipcode_info['State'].loc[zipcode_info.Country!="US"]='UnknownOrNonUS'
zipcode_info['Region']=zipcode_info['State'].copy()
zipcode_info['Region'].loc[zipcode_info.Country=="US"]=zipcode_info.Region.loc[zipcode_info.Country=="US"].map(lambda x: us_regs_dct[x] if x in us_regs_dct else 'UsOther')
zipcode_info=zipcode_info[['Zipcode', 'Region']]
zipcode_info.head()
def process_zip(zp):
try:
zp=np.int(zp)
return zp
except:
return np.nan
user_info["Zipcode"]=user_info.Zipcode.map(process_zip)
user_info=pd.merge(user_info,zipcode_info,on='Zipcode',how='left')
user_info['Region']=user_info.Region.fillna('UnknownOrNonUS')
user_info=pd.get_dummies(user_info[['UserId','Age','Gender','Occupation','Region']])
users_w_side_info=set(list(user_info.UserId))
ratings=ratings.loc[ratings.UserId.map(lambda x: x in users_w_side_info)]
user_info.head()
C:\Users\david\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2723: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
UserId | Age | Gender_F | Gender_M | Occupation_administrator | Occupation_artist | Occupation_doctor | Occupation_educator | Occupation_engineer | Occupation_entertainment | ... | Occupation_technician | Occupation_writer | Region_Middle Atlantic | Region_Midwest | Region_New England | Region_South | Region_Southwest | Region_UnknownOrNonUS | Region_UsOther | Region_West | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 24 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | 2 | 53 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
2 | 3 | 23 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 4 | 24 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 5 | 33 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 33 columns
Adding explicit information gives the latent factors a more solid base, so fewer of them are needed the more side info there is available.
# Number of latent factors
k=30
k_main=5
k_genre=5
k_demo=5
# Regularization parameter
reg=50
# This time I'll weight the ratings matrix higher
w_main=4
# Fitting the model
rec3=CMF(k=k, k_main=k_main, k_item=k_genre, k_user=k_demo, w_main=w_main, reg_param=reg)
rec3.fit(train, genres, user_info, random_seed=32545)
# Making predictions
test['Predicted']=test.apply(lambda x: rec3.predict(x['UserId'],x['ItemId']),axis=1)
Same metrics as before:
np.sqrt(np.mean((test.Predicted-test.Rating)**2))
1.2433900285807755
test2=pd.merge(test,avg_ratings,left_on='ItemId',right_index=True,how='left')
print('Averge movie rating:',test2.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 rated by each user:',test2.sort_values(['UserId','Rating'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for top-5 recommendations of best-rated movies:',test2.sort_values(['UserId','AvgRating'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('----------------------')
print('Average rating for top-5 recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=False).groupby('UserId')['Rating'].head(5).mean())
print('Average rating for bottom-5 (non-)recommendations from this model:',test2.sort_values(['UserId','Predicted'],ascending=True).groupby('UserId')['Rating'].head(5).mean())
Averge movie rating: 3.5602718818211856 Average rating for top-5 rated by each user: 4.5298621745788665 Average rating for bottom-5 rated by each user: 2.246554364471669 Average rating for top-5 recommendations of best-rated movies: 4.029096477794793 ---------------------- Average rating for top-5 recommendations from this model: 4.062787136294028 Average rating for bottom-5 (non-)recommendations from this model: 3.120980091883614
This time the improvement was bigger and the Top-5 recommendations seem now to have increased by a bigger margin - with just adding the most basic demographic information!
Now let's see what are of each these models recommending to some randomly picked users, along with the overall item popularity:
# aggregate statistics
avg_movie_rating=train.groupby('ItemId')['Rating'].mean()
num_ratings_per_movie=train.groupby('ItemId')['Rating'].agg(lambda x: len(tuple(x)))
# function to print recommended lists more nicely
def print_reclist(reclist):
list_w_info=[str(m+1)+") - "+movie_id_to_title[reclist[m]]+\
" - Average Rating: "+str(np.round(avg_movie_rating[reclist[m]],2))+\
" - Number of ratings: "+str(num_ratings_per_movie[reclist[m]]) for m in range(len(reclist))]
print("\n".join(list_w_info))
# user 1
reclist1=rec.top_n(UserId=1, n=20)
reclist2=rec2.top_n(UserId=1, n=20)
reclist3=rec3.top_n(UserId=1, n=20)
print('Recommendations from ratings-only model:')
print_reclist(reclist1)
print("------")
print('Recommendations from ratings + genre model:')
print_reclist(reclist2)
print("------")
print('Recommendations from ratings + genre + demographics model:')
print_reclist(reclist3)
Recommendations from ratings-only model: 1) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 2) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 3) - Toy Story (1995) - Average Rating: 3.92 - Number of ratings: 267 4) - Graduate, The (1967) - Average Rating: 4.19 - Number of ratings: 134 5) - Big Night (1996) - Average Rating: 3.86 - Number of ratings: 103 6) - Wrong Trousers, The (1993) - Average Rating: 4.59 - Number of ratings: 68 7) - Princess Bride, The (1987) - Average Rating: 4.23 - Number of ratings: 184 8) - 12 Angry Men (1957) - Average Rating: 4.3 - Number of ratings: 71 9) - Antonia's Line (1995) - Average Rating: 4.07 - Number of ratings: 43 10) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 11) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 12) - Chasing Amy (1997) - Average Rating: 3.77 - Number of ratings: 119 13) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 14) - Full Monty, The (1997) - Average Rating: 4.06 - Number of ratings: 135 15) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 16) - Empire Strikes Back, The (1980) - Average Rating: 4.24 - Number of ratings: 201 17) - Swingers (1996) - Average Rating: 3.93 - Number of ratings: 91 18) - Chasing Amy (1997) - Average Rating: 4.03 - Number of ratings: 68 19) - Blade Runner (1982) - Average Rating: 4.11 - Number of ratings: 151 20) - Nikita (La Femme Nikita) (1990) - Average Rating: 4.13 - Number of ratings: 61 ------ Recommendations from ratings + genre model: 1) - Toy Story (1995) - Average Rating: 3.92 - Number of ratings: 267 2) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 3) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 4) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 5) - Princess Bride, The (1987) - Average Rating: 4.23 - Number of ratings: 184 6) - Big Night (1996) - Average Rating: 3.86 - Number of ratings: 103 7) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 8) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 9) - Graduate, The (1967) - Average Rating: 4.19 - Number of ratings: 134 10) - Wrong Trousers, The (1993) - Average Rating: 4.59 - Number of ratings: 68 11) - Full Monty, The (1997) - Average Rating: 4.06 - Number of ratings: 135 12) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 13) - Chasing Amy (1997) - Average Rating: 3.77 - Number of ratings: 119 14) - 12 Angry Men (1957) - Average Rating: 4.3 - Number of ratings: 71 15) - Antonia's Line (1995) - Average Rating: 4.07 - Number of ratings: 43 16) - Chasing Amy (1997) - Average Rating: 4.03 - Number of ratings: 68 17) - Empire Strikes Back, The (1980) - Average Rating: 4.24 - Number of ratings: 201 18) - Monty Python and the Holy Grail (1974) - Average Rating: 4.14 - Number of ratings: 183 19) - Return of the Jedi (1983) - Average Rating: 3.99 - Number of ratings: 300 20) - Welcome to the Dollhouse (1995) - Average Rating: 3.86 - Number of ratings: 69 ------ Recommendations from ratings + genre + demographics model: 1) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 2) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 3) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 4) - Toy Story (1995) - Average Rating: 3.92 - Number of ratings: 267 5) - Wrong Trousers, The (1993) - Average Rating: 4.59 - Number of ratings: 68 6) - Chasing Amy (1997) - Average Rating: 3.77 - Number of ratings: 119 7) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 8) - Princess Bride, The (1987) - Average Rating: 4.23 - Number of ratings: 184 9) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 10) - Swingers (1996) - Average Rating: 3.93 - Number of ratings: 91 11) - Empire Strikes Back, The (1980) - Average Rating: 4.24 - Number of ratings: 201 12) - Big Night (1996) - Average Rating: 3.86 - Number of ratings: 103 13) - Full Monty, The (1997) - Average Rating: 4.06 - Number of ratings: 135 14) - 12 Angry Men (1957) - Average Rating: 4.3 - Number of ratings: 71 15) - Monty Python and the Holy Grail (1974) - Average Rating: 4.14 - Number of ratings: 183 16) - Pulp Fiction (1994) - Average Rating: 4.16 - Number of ratings: 225 17) - Chasing Amy (1997) - Average Rating: 4.03 - Number of ratings: 68 18) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 19) - Return of the Jedi (1983) - Average Rating: 3.99 - Number of ratings: 300 20) - Nikita (La Femme Nikita) (1990) - Average Rating: 4.13 - Number of ratings: 61
# user 943
reclist1=rec.top_n(UserId=943, n=20)
reclist2=rec2.top_n(UserId=943, n=20)
reclist3=rec3.top_n(UserId=943, n=20)
print_reclist(reclist1)
print('Recommendations from ratings-only model:')
print("------")
print('Recommendations from ratings + genre model:')
print_reclist(reclist2)
print("------")
print('Recommendations from ratings + genre + demographics model:')
print_reclist(reclist3)
1) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 2) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 3) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 4) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 5) - Courage Under Fire (1996) - Average Rating: 3.58 - Number of ratings: 137 6) - Rock, The (1996) - Average Rating: 3.72 - Number of ratings: 227 7) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 8) - Time to Kill, A (1996) - Average Rating: 3.67 - Number of ratings: 138 9) - Return of the Jedi (1983) - Average Rating: 3.99 - Number of ratings: 300 10) - People vs. Larry Flynt, The (1996) - Average Rating: 3.69 - Number of ratings: 123 11) - Trainspotting (1996) - Average Rating: 3.93 - Number of ratings: 164 12) - Mission: Impossible (1996) - Average Rating: 3.37 - Number of ratings: 209 13) - Apollo 13 (1995) - Average Rating: 3.92 - Number of ratings: 160 14) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 15) - Independence Day (ID4) (1996) - Average Rating: 3.47 - Number of ratings: 258 16) - Lone Star (1996) - Average Rating: 3.98 - Number of ratings: 119 17) - Rumble in the Bronx (1995) - Average Rating: 3.45 - Number of ratings: 116 18) - River Wild, The (1994) - Average Rating: 3.23 - Number of ratings: 84 19) - Truth About Cats & Dogs, The (1996) - Average Rating: 3.51 - Number of ratings: 170 20) - Broken Arrow (1996) - Average Rating: 3.04 - Number of ratings: 158 Recommendations from ratings-only model: ------ Recommendations from ratings + genre model: 1) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 2) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 3) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 4) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 5) - Courage Under Fire (1996) - Average Rating: 3.58 - Number of ratings: 137 6) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 7) - Rock, The (1996) - Average Rating: 3.72 - Number of ratings: 227 8) - Time to Kill, A (1996) - Average Rating: 3.67 - Number of ratings: 138 9) - Return of the Jedi (1983) - Average Rating: 3.99 - Number of ratings: 300 10) - Mission: Impossible (1996) - Average Rating: 3.37 - Number of ratings: 209 11) - People vs. Larry Flynt, The (1996) - Average Rating: 3.69 - Number of ratings: 123 12) - Trainspotting (1996) - Average Rating: 3.93 - Number of ratings: 164 13) - Apollo 13 (1995) - Average Rating: 3.92 - Number of ratings: 160 14) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 15) - Lone Star (1996) - Average Rating: 3.98 - Number of ratings: 119 16) - Independence Day (ID4) (1996) - Average Rating: 3.47 - Number of ratings: 258 17) - Rumble in the Bronx (1995) - Average Rating: 3.45 - Number of ratings: 116 18) - River Wild, The (1994) - Average Rating: 3.23 - Number of ratings: 84 19) - Broken Arrow (1996) - Average Rating: 3.04 - Number of ratings: 158 20) - Truth About Cats & Dogs, The (1996) - Average Rating: 3.51 - Number of ratings: 170 ------ Recommendations from ratings + genre + demographics model: 1) - Godfather, The (1972) - Average Rating: 4.28 - Number of ratings: 237 2) - Star Wars (1977) - Average Rating: 4.34 - Number of ratings: 335 3) - Fargo (1996) - Average Rating: 4.23 - Number of ratings: 301 4) - Shawshank Redemption, The (1994) - Average Rating: 4.56 - Number of ratings: 174 5) - Rock, The (1996) - Average Rating: 3.72 - Number of ratings: 227 6) - Raiders of the Lost Ark (1981) - Average Rating: 4.32 - Number of ratings: 238 7) - Courage Under Fire (1996) - Average Rating: 3.58 - Number of ratings: 137 8) - Return of the Jedi (1983) - Average Rating: 3.99 - Number of ratings: 300 9) - Time to Kill, A (1996) - Average Rating: 3.67 - Number of ratings: 138 10) - Mission: Impossible (1996) - Average Rating: 3.37 - Number of ratings: 209 11) - Trainspotting (1996) - Average Rating: 3.93 - Number of ratings: 164 12) - Apollo 13 (1995) - Average Rating: 3.92 - Number of ratings: 160 13) - People vs. Larry Flynt, The (1996) - Average Rating: 3.69 - Number of ratings: 123 14) - Dead Man Walking (1995) - Average Rating: 3.94 - Number of ratings: 185 15) - Independence Day (ID4) (1996) - Average Rating: 3.47 - Number of ratings: 258 16) - Rumble in the Bronx (1995) - Average Rating: 3.45 - Number of ratings: 116 17) - Lone Star (1996) - Average Rating: 3.98 - Number of ratings: 119 18) - Truth About Cats & Dogs, The (1996) - Average Rating: 3.51 - Number of ratings: 170 19) - Broken Arrow (1996) - Average Rating: 3.04 - Number of ratings: 158 20) - Happy Gilmore (1996) - Average Rating: 3.24 - Number of ratings: 93