from fastai.gen_doc.nbdoc import *
This package contains all the necessary functions to quickly train a model for a collaborative filtering task. Let's start by importing all we'll need.
from fastai.collab import *
Collaborative filtering is when you're tasked to predict how much a user is going to like a certain item. The fastai library contains a CollabFilteringDataset
class that will help you create datasets suitable for training, and a function get_colab_learner
to build a simple model directly from a ratings table. Let's first see how we can get started before delving into the documentation.
For this example, we'll use a small subset of the MovieLens dataset to predict the rating a user would give a particular movie (from 0 to 5). The dataset comes in the form of a csv file where each line is a rating of a movie by a given person.
path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
ratings.head()
userId | movieId | rating | timestamp | |
---|---|---|---|---|
0 | 73 | 1097 | 4.0 | 1255504951 |
1 | 561 | 924 | 3.5 | 1172695223 |
2 | 157 | 260 | 3.5 | 1291598691 |
3 | 358 | 1210 | 5.0 | 957481884 |
4 | 130 | 316 | 2.0 | 1138999234 |
We'll first turn the userId
and movieId
columns in category codes, so that we can replace them with their codes when it's time to feed them to an Embedding
layer. This step would be even more important if our csv had names of users, or names of items in it. To do it, we simply have to call a CollabDataBunch
factory method.
data = CollabDataBunch.from_df(ratings)
Now that this step is done, we can directly create a Learner
object:
learn = collab_learner(data, n_factors=50, y_range=(0.,5.))
And then immediately begin training
learn.fit_one_cycle(5, 5e-3, wd=0.1)
epoch | train_loss | valid_loss |
---|---|---|
1 | 2.427430 | 1.999472 |
2 | 1.116335 | 0.663345 |
3 | 0.736155 | 0.636640 |
4 | 0.612827 | 0.626773 |
5 | 0.565003 | 0.626336 |
show_doc(CollabDataBunch)
class
CollabDataBunch
[source]
CollabDataBunch
(train_dl
:DataLoader
,valid_dl
:DataLoader
,fix_dl
:DataLoader
=*None
,test_dl
:Optional
[DataLoader
]=None
,device
:device
=None
,dl_tfms
:Optional
[Collection
[Callable
]]=None
,path
:PathOrStr
='.'
,collate_fn
:Callable
='data_collate'
,no_check
:bool
=False
*) ::DataBunch
Base DataBunch
for collaborative filtering.
The init function shouldn't be called directly (as it's the one of a basic DataBunch
), instead, you'll want to use the following factory method.
show_doc(CollabDataBunch.from_df)
from_df
[source]
from_df
(ratings
:DataFrame
,valid_pct
:float
=*0.2
,user_name
:Optional
[str
]=None
,item_name
:Optional
[str
]=None
,rating_name
:Optional
[str
]=None
,test
:DataFrame
=None
,seed
:int
=None
,path
:PathOrStr
='.'
,bs
:int
=64
,val_bs
:int
=None
,num_workers
:int
=4
,dl_tfms
:Optional
[Collection
[Callable
]]=None
,device
:device
=None
,collate_fn
:Callable
='data_collate'
,no_check
:bool
=False
*) →CollabDataBunch
Create a DataBunch
suitable for collaborative filtering from ratings
.
Take a ratings
dataframe and splits it randomly for train and test following pct_val
(unless it's None). user_name
, item_name
and rating_name
give the names of the corresponding columns (defaults to the first, the second and the third column). Optionally a test
dataframe can be passed an a seed
for the separation between training and validation set. The kwargs
will be passed to DataBunch.create
.
show_doc(CollabLearner, title_level=3)
class
CollabLearner
[source]
CollabLearner
(data
:DataBunch
,model
:Module
,opt_func
:Callable
=*'Adam'
,loss_func
:Callable
=None
,metrics
:Collection
[Callable
]=None
,true_wd
:bool
=True
,bn_wd
:bool
=True
,wd
:Floats
=0.01
,train_bn
:bool
=True
,path
:str
=None
,model_dir
:str
='models'
,callback_fns
:Collection
[Callable
]=None
,callbacks
:Collection
[Callback
]=<factory>
,layer_groups
:ModuleList
=None
,add_time
:bool
=True
*) ::Learner
Learner
suitable for collaborative filtering.
show_doc(CollabLearner.bias)
bias
[source]
bias
(arr
:Collection
[T_co
],is_item
:bool
=*True
*)
Bias for item or user (based on is_item
) for all in arr
. (Set model to cpu
and no grad.)
show_doc(CollabLearner.get_idx)
get_idx
[source]
get_idx
(arr
:Collection
[T_co
],is_item
:bool
=*True
*)
Fetch item or user (based on is_item
) for all in arr
. (Set model to cpu
and no grad.)
show_doc(CollabLearner.weight)
weight
[source]
weight
(arr
:Collection
[T_co
],is_item
:bool
=*True
*)
Bias for item or user (based on is_item
) for all in arr
. (Set model to cpu
and no grad.)
show_doc(EmbeddingDotBias, title_level=3)
Creates a simple model with Embedding
weights and biases for n_users
and n_items
, with n_factors
latent factors. Takes the dot product of the embeddings and adds the bias, then if y_range
is specified, feed the result to a sigmoid rescaled to go from y_range[0]
to y_range[1]
.
show_doc(EmbeddingNN, title_level=3)
class
EmbeddingNN
[source]
EmbeddingNN
(emb_szs
:ListSizes
,layers
:Collection
[int
]=*None
,ps
:Collection
[float
]=None
,emb_drop
:float
=0.0
,y_range
:OptRange
=None
,use_bn
:bool
=True
,bn_final
:bool
=False
*) ::TabularModel
Subclass TabularModel
to create a NN suitable for collaborative filtering.
emb_szs
will overwrite the default and kwargs
are passed to TabularModel
.
show_doc(collab_learner)
collab_learner
[source]
collab_learner
(data
,n_factors
:int
=*None
,use_nn
:bool
=False
,emb_szs
:Dict
[str
,int
]=None
,layers
:Collection
[int
]=None
,ps
:Collection
[float
]=None
,emb_drop
:float
=0.0
,y_range
:OptRange
=None
,use_bn
:bool
=True
,bn_final
:bool
=False
, ***learn_kwargs
**) →Learner
Create a Learner for collaborative filtering on data
.
More specifically, binds data
with a model that is either an EmbeddingDotBias
with n_factors
if use_nn=False
or a EmbeddingNN
with emb_szs
otherwise. In both cases the numbers of users and items will be inferred from the data, y_range
can be specified in the kwargs
and you can pass metrics
or wd
to the Learner
constructor.
show_doc(CollabLine, doc_string=False, title_level=3)
class
CollabLine
[source]
CollabLine
(cats
,conts
,classes
,names
) ::TabularLine
Subclass of TabularLine
for collaborative filtering.
show_doc(CollabList, title_level=3, doc_string=False)
class
CollabList
[source]
CollabList
(items
:Iterator
[T_co
],cat_names
:OptStrList
=*None
,cont_names
:OptStrList
=None
,procs
=None
, ***kwargs
**) →TabularList
::TabularList
Subclass of TabularList
for collaborative filtering.
show_doc(EmbeddingDotBias.forward)
forward
[source]
forward
(users
:LongTensor
,items
:LongTensor
) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(CollabList.reconstruct)
show_doc(EmbeddingNN.forward)
forward
[source]
forward
(users
:LongTensor
,items
:LongTensor
) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.