Collaborative filtering¶

In [1]:

from fastai.gen_doc.nbdoc import *

This package contains all the necessary functions to quickly train a model for a collaborative filtering task. Let's start by importing all we'll need.

In [2]:

from fastai import *
from fastai.collab import * 

Overview¶

Collaborative filtering is when you're tasked to predict how much a user is going to like a certain item. The fastai library contains a CollabFilteringDataset class that will help you create datasets suitable for training, and a function get_colab_learner to build a simple model directly from a ratings table. Let's first see how we can get started before devling in the documentation.

For our example, we'll use a small subset of the MovieLens dataset. In there, we have to predict the rating a user gave a given movie (from 0 to 5). It comes in the form of a csv file where each line is the rating of a movie by a given person.

In [ ]:

path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
ratings.head()

Out[ ]:

	userId	movieId	rating	timestamp
0	73	1097	4.0	1255504951
1	561	924	3.5	1172695223
2	157	260	3.5	1291598691
3	358	1210	5.0	957481884
4	130	316	2.0	1138999234

We'll first turn the userId and movieId columns in category codes, so that we can replace them with their codes when it's time to feed them to an Embedding layer. This step would be even more important if our csv had names of users, or names of items in it.

In [ ]:

series2cat(ratings, 'userId','movieId')

Now that this step is done, we can directly create a Learner object:

In [ ]:

learn = get_collab_learner(ratings, n_factors=50, pct_val=0.2, min_score=0., max_score=5.)

And the immediately begin training

In [ ]:

learn.fit_one_cycle(5, 5e-3, wd=0.1)

VBox(children=(HBox(children=(IntProgress(value=0, max=5), HTML(value='0.00% [0/5 00:00<00:00]'))), HTML(value…

Total time: 00:04
epoch  train loss  valid loss
1      2.368736    1.849535    (00:00)
2      1.080932    0.691473    (00:00)
3      0.740156    0.669135    (00:00)
4      0.629487    0.658641    (00:00)
5      0.599293    0.654870    (00:00)

In [3]:

show_doc(CollabFilteringDataset, doc_string=False)

`class` `CollabFilteringDataset`[source]

CollabFilteringDataset(user:Series, item:Series, ratings:ndarray) :: DatasetBase

This is the basic class to buil a Dataset suitable for colaborative filtering. user and item should be categorical series that will be replaced with their codes internally and have the corresponding ratings. One of the factory methods will prepare the data in this format.

In [4]:

show_doc(CollabFilteringDataset.from_df, doc_string=False)

`from_df`[source]

from_df(rating_df:DataFrame, pct_val:float=0.2, user_name:Optional[str]=None, item_name:Optional[str]=None, rating_name:Optional[str]=None) → Tuple[ColabFilteringDataset, ColabFilteringDataset]

Takes a rating_df and splits it randomly for train and test following pct_val (unless it's None). user_name, item_name and rating_name give the names of the corresponding columns (defaults to the first, the second and the third column).

In [5]:

show_doc(CollabFilteringDataset.from_csv, doc_string=False)

`from_csv`[source]

from_csv(csv_name:str, kwargs) → Tuple[ColabFilteringDataset, ColabFilteringDataset]

Opens the file in csv_name as a DataFrame and feeds it to show_doc.from_df with the kwargs.

Model and `Learner`¶

In [6]:

show_doc(EmbeddingDotBias, doc_string=False, title_level=3)

`class` `EmbeddingDotBias`[source]

EmbeddingDotBias(n_factors:int, n_users:int, n_items:int, min_score:float=None, max_score:float=None) :: Module

Creates a simple model with Embedding weights and biases for n_users and n_items, with n_factors latent factors. Takes the dot product of the embeddings and adds the bias, then feed the result to a sigmoid rescaled to go from min_score to max_score.

In [7]:

show_doc(get_collab_learner, doc_string=False)

`get_collab_learner`[source]

get_collab_learner(ratings:DataFrame, n_factors:int, pct_val:float=0.2, user_name:Optional[str]=None, item_name:Optional[str]=None, rating_name:Optional[str]=None, test:DataFrame=None, metrics=None, min_score:float=None, max_score:float=None, kwargs) → Learner

Creates a Learner object built from the data in ratings, pct_val, user_name, item_name, rating_name to CollabFilteringDataset. Optionally, creates another CollabFilteringDataset for test. kwargs are fed to DataBunch.create with these datasets. The model is given by EmbeddingDotBias with n_factors, min_score and max_score (the numbers of users and items will be inferred from the data).

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

In [8]:

show_doc(EmbeddingDotBias.forward)

`forward`[source]

forward(users:LongTensor, items:LongTensor) → Tensor

Defines the computation performed at every call. Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Collaborative filtering¶

Overview¶

class CollabFilteringDataset[source]

from_df[source]

from_csv[source]

Model and Learner¶

class EmbeddingDotBias[source]

get_collab_learner[source]

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

forward[source]

`class` `CollabFilteringDataset`[source]

`from_df`[source]

`from_csv`[source]

Model and `Learner`¶

`class` `EmbeddingDotBias`[source]

`get_collab_learner`[source]

`forward`[source]