Collaborative filtering¶

In [ ]:

from fastai.gen_doc.nbdoc import *

This package contains all the necessary functions to quickly train a model for a collaborative filtering task. Let's start by importing all we'll need.

In [ ]:

from fastai.collab import * 

Overview¶

Collaborative filtering is when you're tasked to predict how much a user is going to like a certain item. The fastai library contains a CollabFilteringDataset class that will help you create datasets suitable for training, and a function get_colab_learner to build a simple model directly from a ratings table. Let's first see how we can get started before delving into the documentation.

For this example, we'll use a small subset of the MovieLens dataset to predict the rating a user would give a particular movie (from 0 to 5). The dataset comes in the form of a csv file where each line is a rating of a movie by a given person.

In [ ]:

path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
ratings.head()

Out[ ]:

	userId	movieId	rating	timestamp
0	73	1097	4.0	1255504951
1	561	924	3.5	1172695223
2	157	260	3.5	1291598691
3	358	1210	5.0	957481884
4	130	316	2.0	1138999234

We'll first turn the userId and movieId columns in category codes, so that we can replace them with their codes when it's time to feed them to an Embedding layer. This step would be even more important if our csv had names of users, or names of items in it. To do it, we simply have to call a CollabDataBunch factory method.

In [ ]:

data = CollabDataBunch.from_df(ratings)

Now that this step is done, we can directly create a Learner object:

In [ ]:

learn = collab_learner(data, n_factors=50, y_range=(0.,5.))

And then immediately begin training

In [ ]:

learn.fit_one_cycle(5, 5e-3, wd=0.1)

Total time: 00:09

epoch	train_loss	valid_loss
1	2.427430	1.999472
2	1.116335	0.663345
3	0.736155	0.636640
4	0.612827	0.626773
5	0.565003	0.626336

In [ ]:

show_doc(CollabDataBunch)

`class` `CollabDataBunch`[source]

CollabDataBunch(train_dl:DataLoader, valid_dl:DataLoader, fix_dl:DataLoader=*None, test_dl:Optional[DataLoader]=None, device:device=None, dl_tfms:Optional[Collection[Callable]]=None, path:PathOrStr='.', collate_fn:Callable='data_collate', no_check:bool=False*) :: DataBunch

Base DataBunch for collaborative filtering.

The init function shouldn't be called directly (as it's the one of a basic DataBunch), instead, you'll want to use the following factory method.

In [ ]:

show_doc(CollabDataBunch.from_df)

`from_df`[source]

from_df(ratings:DataFrame, pct_val:float=*0.2, user_name:Optional[str]=None, item_name:Optional[str]=None, rating_name:Optional[str]=None, test:DataFrame=None, seed:int=None, path:PathOrStr='.', bs:int=64, val_bs:int=None, num_workers:int=4, dl_tfms:Optional[Collection[Callable]]=None, device:device=None, collate_fn:Callable='data_collate', no_check:bool=False*) → CollabDataBunch

Create a DataBunch suitable for collaborative filtering from ratings.

Take a ratings dataframe and splits it randomly for train and test following pct_val (unless it's None). user_name, item_name and rating_name give the names of the corresponding columns (defaults to the first, the second and the third column). Optionally a test dataframe can be passed an a seed for the separation between training and validation set. The kwargs will be passed to DataBunch.create.

Model and `Learner`¶

In [ ]:

show_doc(CollabLearner, title_level=3)

`class` `CollabLearner`[source]

CollabLearner(data:DataBunch, model:Module, opt_func:Callable=*'Adam', loss_func:Callable=None, metrics:Collection[Callable]=None, true_wd:bool=True, bn_wd:bool=True, wd:Floats=0.01, train_bn:bool=True, path:str=None, model_dir:str='models', callback_fns:Collection[Callable]=None, callbacks:Collection[Callback]=<factory>, layer_groups:ModuleList=None*) :: Learner

Learner suitable for collaborative filtering.

This is a subclass of Learner that just introduces helper functions to analyze results, the initialization is the same as a regular Learner.

In [ ]:

show_doc(CollabLearner.bias)

`bias`[source]

bias(arr:Collection[T_co], is_item:bool=*True*)

Bias for item or user (based on is_item) for all in arr. (Set model to cpu and no grad.)

In [ ]:

show_doc(CollabLearner.get_idx)

`get_idx`[source]

get_idx(arr:Collection[T_co], is_item:bool=*True*)

Fetch item or user (based on is_item) for all in arr. (Set model to cpu and no grad.)

In [ ]:

show_doc(CollabLearner.weight)

`weight`[source]

weight(arr:Collection[T_co], is_item:bool=*True*)

Bias for item or user (based on is_item) for all in arr. (Set model to cpu and no grad.)

In [ ]:

show_doc(EmbeddingDotBias, title_level=3)

`class` `EmbeddingDotBias`[source]

EmbeddingDotBias(n_factors:int, n_users:int, n_items:int, y_range:Point=*None*) :: Module

Base dot model for collaborative filtering.

Creates a simple model with Embedding weights and biases for n_users and n_items, with n_factors latent factors. Takes the dot product of the embeddings and adds the bias, then if y_range is specified, feed the result to a sigmoid rescaled to go from y_range[0] to y_range[1].

In [ ]:

show_doc(EmbeddingNN, title_level=3)

`class` `EmbeddingNN`[source]

EmbeddingNN(emb_szs:ListSizes, layers:Collection[int]=*None, ps:Collection[float]=None, emb_drop:float=0.0, y_range:OptRange=None, use_bn:bool=True, bn_final:bool=False*) :: TabularModel

Subclass TabularModel to create a NN suitable for collaborative filtering.

emb_szs will overwrite the default and kwargs are passed to TabularModel.

In [ ]:

show_doc(collab_learner)

`collab_learner`[source]

collab_learner(data, n_factors:int=*None, use_nn:bool=False, emb_szs:Dict[str, int]=None, layers:Collection[int]=None, ps:Collection[float]=None, emb_drop:float=0.0, y_range:OptRange=None, use_bn:bool=True, bn_final:bool=False, ***learn_kwargs**) → Learner

Create a Learner for collaborative filtering on data.

More specifically, binds data with a model that is either an EmbeddingDotBias with n_factors if use_nn=False or a EmbeddingNN with emb_szs otherwise. In both cases the numbers of users and items will be inferred from the data, y_range can be specified in the kwargs and you can pass metrics or wd to the Learner constructor.

Links with the Data Block API¶

In [ ]:

show_doc(CollabLine, doc_string=False, title_level=3)

`class` `CollabLine`[source]

CollabLine(cats, conts, classes, names) :: TabularLine

Subclass of TabularLine for collaborative filtering.

In [ ]:

show_doc(CollabList, title_level=3, doc_string=False)

`class` `CollabList`[source]

CollabList(items:Iterator[T_co], cat_names:OptStrList=*None, cont_names:OptStrList=None, procs=None, ***kwargs**) → TabularList :: TabularList

Subclass of TabularList for collaborative filtering.

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

In [ ]:

show_doc(EmbeddingDotBias.forward)

`forward`[source]

forward(users:LongTensor, items:LongTensor) → Tensor

Defines the computation performed at every call. Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

In [ ]:

show_doc(CollabList.reconstruct)

`reconstruct`[source]

reconstruct(t:Tensor)

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(EmbeddingNN.forward)

`forward`[source]

forward(users:LongTensor, items:LongTensor) → Tensor

Defines the computation performed at every call. Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Collaborative filtering¶

Overview¶

class CollabDataBunch[source]

from_df[source]

Model and Learner¶

class CollabLearner[source]

bias[source]

get_idx[source]

weight[source]

class EmbeddingDotBias[source]

class EmbeddingNN[source]

collab_learner[source]

Links with the Data Block API¶

class CollabLine[source]

class CollabList[source]

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

forward[source]

reconstruct[source]

forward[source]

New Methods - Please document or move to the undocumented section¶

`class` `CollabDataBunch`[source]

`from_df`[source]

Model and `Learner`¶

`class` `CollabLearner`[source]

`bias`[source]

`get_idx`[source]

`weight`[source]

`class` `EmbeddingDotBias`[source]

`class` `EmbeddingNN`[source]

`collab_learner`[source]

`class` `CollabLine`[source]

`class` `CollabList`[source]

`forward`[source]

`reconstruct`[source]

`forward`[source]