RePlay Tutorial¶

This notebook is designed to familiarize with the use of RePlay library, including

data preprocessing
data splitting
model training and inference
model optimization
model saving and loading
models comparison

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

%config Completer.use_jedi = False

In [3]:

import warnings
from optuna.exceptions import ExperimentalWarning
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=ExperimentalWarning)

In [4]:

import pandas as pd
from pyspark.sql.functions import rand

from replay.data_preparator import DataPreparator
from replay.experiment import Experiment
from replay.metrics import Coverage, HitRate, NDCG, MAP
from replay.model_handler import save, load
from replay.models import ALSWrap, KNN, SLIM
from replay.session_handler import State
from replay.splitters import UserSplitter
from replay.utils import convert2spark

In [5]:

K = 5
SEED=1234

0. Data preprocessing ¶

We will use MovieLens 1m as an example.

In [6]:

df = pd.read_csv("data/ml1m_ratings.dat", sep="\t", names=["user_id", "item_id", "relevance", "timestamp"])
users = pd.read_csv("data/ml1m_users.dat", sep="\t", names=["user_id", "gender", "age", "occupation", "zip_code"])

0.1. DataPreparator¶

An inner data format in RePlay is a spark dataframe. You can pass spark or pandas dataframe as an input. Columns item_id and user_id are required for interaction matrix. Optional columns for interaction matrix are relevance and interaction timestamp.

We implemented DataPreparator class to convert dataframes to spark format and preprocess the data, including renaming/creation of required and optional interaction matrix columns, null check and dates parsing.

To convert pandas dataframe to spark as is use function convert_to_spark from replay.utils.

In [7]:

preparator = DataPreparator()
log, _, _ = preparator(df)

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/u19893556/miniconda3/envs/replay/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/02/27 23:04:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/27 23:04:23 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
22/02/27 23:04:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/02/27 23:04:23 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
22/02/27 23:04:23 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.

In [8]:

log.show(3)

+---------+---------+--------+--------+
|relevance|timestamp|user_idx|item_idx|
+---------+---------+--------+--------+
|        5|978300760|    4131|      43|
|        3|978302109|    4131|     585|
|        3|978301968|    4131|     461|
+---------+---------+--------+--------+
only showing top 3 rows

In [9]:

users = convert2spark(users)
users.show(3)

+-------+------+---+----------+--------+
|user_id|gender|age|occupation|zip_code|
+-------+------+---+----------+--------+
|      1|     F|  1|        10|   48067|
|      2|     M| 56|        16|   70072|
|      3|     M| 25|        15|   55117|
+-------+------+---+----------+--------+
only showing top 3 rows

0.2. Split¶

RePlay provides you with data splitters to reproduce a validation schemas widely-used in recommender systems.

UserSplitter takes item_test_size items for user_test_size user to the test dataset.

In [10]:

splitter = UserSplitter(
    drop_cold_items=True,
    drop_cold_users=True,
    item_test_size=K,
    user_test_size=500,
    seed=SEED,
    shuffle=True
)
train, test = splitter.split(log)
print(train.count(), test.count())

22/02/27 23:04:37 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/02/27 23:04:38 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
[Stage 27:============================================>        (121 + 23) / 144]

997709 2500

1. Models training¶

SLIM¶

In [11]:

slim = SLIM(seed=SEED)

In [12]:

%%time

slim.fit(log=train)

CPU times: user 1.53 s, sys: 129 ms, total: 1.66 s
Wall time: 5.9 s

In [13]:

%%time

recs = slim.predict(
    k=K,
    users=test.select('user_idx').distinct(),
    log=train,
    filter_seen_items=True
)

27-Feb-22 23:04:55, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:04:55, replay, WARNING: This model can't predict cold items, they will be ignored

CPU times: user 23.1 ms, sys: 16.4 ms, total: 39.4 ms
Wall time: 1.77 s

In [14]:

recs.show(2)

[Stage 130:==================================>                  (94 + 48) / 144]

+--------+--------+------------------+
|user_idx|item_idx|         relevance|
+--------+--------+------------------+
|      38|      73| 1.235672623556484|
|      38|     361|1.1715979128347436|
+--------+--------+------------------+
only showing top 2 rows

2. Models evaluation¶

RePlay implements some popular recommenders' quality metrics. Use pure metrics or calculate a set of chosen metrics and compare models with the Experiment class.

In [15]:

metrics = Experiment(test, {NDCG(): K,
                            MAP() : K,
                            HitRate(): [1, K],
                            Coverage(train): K
                           })

In [16]:

%%time
metrics.add_result("SLIM", recs)
metrics.results

CPU times: user 360 ms, sys: 75.5 ms, total: 436 ms
Wall time: 47.5 s

Out[16]:

	Coverage@5	HitRate@1	HitRate@5	MAP@5	NDCG@5
SLIM	0.16055	0.242	0.558	0.09372	0.165643

3. Hyperparameters optimization¶

3.1 Search¶

In [17]:

# data split for hyperparameters optimization
train_opt, val_opt = splitter.split(train)

22/02/27 23:06:17 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/02/27 23:06:17 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

In [18]:

%%time
best_params = slim.optimize(train_opt, val_opt, criterion=NDCG(), k=K, budget=15)

[I 2022-02-27 23:06:17,681] A new study created in memory with name: no-name-b0d54335-8d37-401f-a916-3ba55ed9c932
27-Feb-22 23:06:22, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:06:22, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:06:51,535] Trial 0 finished with value: 0.18130037719542139 and parameters: {'beta': 0.01, 'lambda_': 0.01}. Best is trial 0 with value: 0.18130037719542139.
22/02/27 23:06:51 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:06:51 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:06:54, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:06:54, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:07:31,090] Trial 1 finished with value: 0.18197356840108678 and parameters: {'beta': 0.003401392505408624, 'lambda_': 0.002240239840999655}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:07:31 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:07:31 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:07:33, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:07:33, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:07:56,590] Trial 2 finished with value: 0.10199049759426765 and parameters: {'beta': 1.9301997111553214e-05, 'lambda_': 1.1554917603144903}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:07:56 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:07:56 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:07:58, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:07:59, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:08:36,960] Trial 3 finished with value: 0.18040798348695616 and parameters: {'beta': 1.153628706350771e-05, 'lambda_': 2.9530757569977826e-05}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:08:36 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:08:36 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:08:39, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:08:39, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:08:54,180] Trial 4 finished with value: 0.1184952197160257 and parameters: {'beta': 0.0007214008774259759, 'lambda_': 0.7632771957535475}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:08:54 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:08:54 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:08:56, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:08:56, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:09:11,341] Trial 5 finished with value: 0.10801852484092478 and parameters: {'beta': 0.003501448697693071, 'lambda_': 0.9936237326658697}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:09:11 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:09:11 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:09:13, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:09:13, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:09:37,837] Trial 6 finished with value: 0.17982896295330483 and parameters: {'beta': 0.00099662876434958, 'lambda_': 0.03255064745931469}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:09:37 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:09:37 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:09:40, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:09:40, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:10:08,671] Trial 7 finished with value: 0.1794415531942852 and parameters: {'beta': 0.0002421671516396994, 'lambda_': 6.591737385850111e-05}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:10:08 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:10:08 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:10:11, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:10:11, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:10:32,795] Trial 8 finished with value: 0.17158421706899432 and parameters: {'beta': 2.982305555199248, 'lambda_': 0.047378561915999574}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:10:32 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:10:32 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:10:35, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:10:35, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:10:50,730] Trial 9 finished with value: 0.11605432663219907 and parameters: {'beta': 0.18611456836257362, 'lambda_': 0.8088532397607969}. Best is trial 1 with value: 0.18197356840108678.
22/02/27 23:10:50 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:10:50 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:10:53, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:10:53, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:11:18,939] Trial 10 finished with value: 0.184208319077201 and parameters: {'beta': 0.1399219194028095, 'lambda_': 1.0774584742955482e-06}. Best is trial 10 with value: 0.184208319077201.
22/02/27 23:11:18 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:11:18 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:11:21, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:11:21, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:11:46,639] Trial 11 finished with value: 0.1847990350329721 and parameters: {'beta': 0.11351011099824757, 'lambda_': 2.678667716748947e-06}. Best is trial 11 with value: 0.1847990350329721.
22/02/27 23:11:46 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:11:46 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:11:49, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:11:49, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:12:14,589] Trial 12 finished with value: 0.18428506912745 and parameters: {'beta': 0.14719744446335933, 'lambda_': 1.5136391124700838e-06}. Best is trial 11 with value: 0.1847990350329721.
22/02/27 23:12:14 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:12:14 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:12:16, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:12:16, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:12:44,060] Trial 13 finished with value: 0.1847990350329721 and parameters: {'beta': 0.11317072609268032, 'lambda_': 1.963819303556553e-06}. Best is trial 11 with value: 0.1847990350329721.
22/02/27 23:12:44 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:12:44 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:12:46, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:12:46, replay, WARNING: This model can't predict cold items, they will be ignored
[I 2022-02-27 23:13:10,407] Trial 14 finished with value: 0.17283814674301312 and parameters: {'beta': 4.697217749389299, 'lambda_': 2.3961336617398848e-05}. Best is trial 11 with value: 0.1847990350329721.

CPU times: user 24.9 s, sys: 2.73 s, total: 27.6 s
Wall time: 6min 52s

In [19]:

best_params

Out[19]:

{'beta': 0.11351011099824757, 'lambda_': 2.678667716748947e-06}

3.2 Compare with previous¶

In [20]:

def fit_predict_evaluate(model, experiment, name):
    model.fit(log=train)

    recs = model.predict(
        k=K,
        users=test.select('user_idx').distinct(),
        log=train,
        filter_seen_items=True
    )

    experiment.add_result(name, recs)
    return recs

In [21]:

%%time
recs = fit_predict_evaluate(SLIM(**best_params, seed=SEED), metrics, 'SLIM_optimized')
metrics.results.sort_values('NDCG@5', ascending=False)

22/02/27 23:13:15 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:13:15 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:13:18, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:13:18, replay, WARNING: This model can't predict cold items, they will be ignored
                                                                                4]]]]

CPU times: user 1.9 s, sys: 284 ms, total: 2.18 s
Wall time: 52.9 s

Out[21]:

	Coverage@5	HitRate@1	HitRate@5	MAP@5	NDCG@5
SLIM_optimized	0.147598	0.240	0.570	0.095547	0.168684
SLIM	0.160550	0.242	0.558	0.093720	0.165643

Convert to pandas¶

In [22]:

recs_pd = recs.toPandas()
recs_pd.head(2)

4]

Out[22]:

	user_idx	item_idx	relevance
0	38	73	1.230351
1	38	361	1.212302

4. Save and load¶

RePlay allows to save and load fitted models with save and load functions of model_handler module. Model is saved as a folder with all necessary parameters and data.

In [23]:

save(slim, path='./slim_best_params')
slim_loaded = load('./slim_best_params')

In [24]:

%%time
pred_from_loaded = slim_loaded.predict(k=K,
    users=test.select('user_idx').distinct(),
    log=train,
    filter_seen_items=True)
pred_from_loaded.show(2)

27-Feb-22 23:14:31, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:14:31, replay, WARNING: This model can't predict cold items, they will be ignored
[Stage 2161:================================>                   (90 + 48) / 144]4]

+--------+--------+------------------+
|user_idx|item_idx|         relevance|
+--------+--------+------------------+
|      38|      14|1.1936188460415138|
|      38|      73|1.1193345759515603|
+--------+--------+------------------+
only showing top 2 rows

CPU times: user 67 ms, sys: 3.66 ms, total: 70.7 ms
Wall time: 13.1 s

In [25]:

slim_loaded.beta, slim_loaded.lambda_

Out[25]:

(0.11351011099824757, 2.678667716748947e-06)

5. Other RePlay models¶

ALS¶

Commonly-used matrix factorization algorithm.

In [26]:

%%time
recs = fit_predict_evaluate(ALSWrap(rank=100, seed=SEED), metrics, 'ALS')
metrics.results.sort_values('NDCG@5', ascending=False)

22/02/27 23:14:46 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:14:46 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:14:50 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
22/02/27 23:14:50 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
22/02/27 23:14:50 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
22/02/27 23:14:50 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
22/02/27 23:15:01 WARN DAGScheduler: Broadcasting large task binary with size 1004.5 KiB
22/02/27 23:15:02 WARN DAGScheduler: Broadcasting large task binary with size 1047.0 KiB
22/02/27 23:15:03 WARN DAGScheduler: Broadcasting large task binary with size 1005.4 KiB
22/02/27 23:15:03 WARN DAGScheduler: Broadcasting large task binary with size 1089.4 KiB
22/02/27 23:15:03 WARN DAGScheduler: Broadcasting large task binary with size 1047.8 KiB
22/02/27 23:15:04 WARN DAGScheduler: Broadcasting large task binary with size 1131.8 KiB
22/02/27 23:15:04 WARN DAGScheduler: Broadcasting large task binary with size 1090.3 KiB
22/02/27 23:15:04 WARN DAGScheduler: Broadcasting large task binary with size 1174.3 KiB
22/02/27 23:15:05 WARN DAGScheduler: Broadcasting large task binary with size 1132.7 KiB
22/02/27 23:15:05 WARN DAGScheduler: Broadcasting large task binary with size 1216.7 KiB
22/02/27 23:15:06 WARN DAGScheduler: Broadcasting large task binary with size 1175.2 KiB
22/02/27 23:15:06 WARN DAGScheduler: Broadcasting large task binary with size 1259.2 KiB
22/02/27 23:15:07 WARN DAGScheduler: Broadcasting large task binary with size 1217.6 KiB
22/02/27 23:15:07 WARN DAGScheduler: Broadcasting large task binary with size 1260.6 KiB
22/02/27 23:15:07 WARN DAGScheduler: Broadcasting large task binary with size 1218.2 KiB
27-Feb-22 23:15:08, replay, WARNING: This model can't predict cold users, they will be ignored
27-Feb-22 23:15:08, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:15:08, replay, WARNING: This model can't predict cold users, they will be ignored
27-Feb-22 23:15:08, replay, WARNING: This model can't predict cold items, they will be ignored
22/02/27 23:15:11 WARN DAGScheduler: Broadcasting large task binary with size 1270.4 KiB
22/02/27 23:15:11 WARN DAGScheduler: Broadcasting large task binary with size 1227.9 KiB
22/02/27 23:15:26 WARN DAGScheduler: Broadcasting large task binary with size 1398.7 KiB
22/02/27 23:15:29 WARN DAGScheduler: Broadcasting large task binary with size 1423.3 KiB
22/02/27 23:15:32 WARN DAGScheduler: Broadcasting large task binary with size 1476.2 KiB
22/02/27 23:15:35 WARN DAGScheduler: Broadcasting large task binary with size 1457.6 KiB
22/02/27 23:15:36 WARN DAGScheduler: Broadcasting large task binary with size 1464.0 KiB
22/02/27 23:15:36 WARN DAGScheduler: Broadcasting large task binary with size 1482.1 KiB
22/02/27 23:15:36 WARN DAGScheduler: Broadcasting large task binary with size 1479.0 KiB
22/02/27 23:15:37 WARN DAGScheduler: Broadcasting large task binary with size 1227.7 KiB
22/02/27 23:15:37 WARN DAGScheduler: Broadcasting large task binary with size 1270.1 KiB
22/02/27 23:15:55 WARN DAGScheduler: Broadcasting large task binary with size 1461.2 KiB
22/02/27 23:15:57 WARN DAGScheduler: Broadcasting large task binary with size 1442.7 KiB
22/02/27 23:15:58 WARN DAGScheduler: Broadcasting large task binary with size 1457.8 KiB
22/02/27 23:15:58 WARN DAGScheduler: Broadcasting large task binary with size 1460.8 KiB
22/02/27 23:16:00 WARN DAGScheduler: Broadcasting large task binary with size 1270.4 KiB
22/02/27 23:16:00 WARN DAGScheduler: Broadcasting large task binary with size 1227.9 KiB
22/02/27 23:16:16 WARN DAGScheduler: Broadcasting large task binary with size 1398.7 KiB
22/02/27 23:16:19 WARN DAGScheduler: Broadcasting large task binary with size 1423.3 KiB
22/02/27 23:16:23 WARN DAGScheduler: Broadcasting large task binary with size 1476.2 KiB
22/02/27 23:16:25 WARN DAGScheduler: Broadcasting large task binary with size 1457.6 KiB
22/02/27 23:16:26 WARN DAGScheduler: Broadcasting large task binary with size 1469.2 KiB
22/02/27 23:16:29 WARN DAGScheduler: Broadcasting large task binary with size 1498.8 KiB
22/02/27 23:16:30 WARN DAGScheduler: Broadcasting large task binary with size 1498.8 KiB
22/02/27 23:16:30 WARN DAGScheduler: Broadcasting large task binary with size 1498.7 KiB
22/02/27 23:16:31 WARN DAGScheduler: Broadcasting large task binary with size 1498.7 KiB

CPU times: user 437 ms, sys: 130 ms, total: 566 ms
Wall time: 1min 44s

Out[26]:

	Coverage@5	HitRate@1	HitRate@5	MAP@5	NDCG@5
SLIM_optimized	0.147598	0.240	0.570	0.095547	0.168684
SLIM	0.160550	0.242	0.558	0.093720	0.165643
ALS	0.195359	0.216	0.540	0.091600	0.160843

KNN¶

Commonly-used item-based recommender

In [27]:

%%time
recs = fit_predict_evaluate(KNN(num_neighbours=100), metrics, 'KNN')
metrics.results.sort_values('NDCG@5', ascending=False)

22/02/27 23:16:31 WARN CacheManager: Asked to cache already cached data.
22/02/27 23:16:31 WARN CacheManager: Asked to cache already cached data.
27-Feb-22 23:16:33, replay, WARNING: This model can't predict cold items, they will be ignored
27-Feb-22 23:16:33, replay, WARNING: This model can't predict cold items, they will be ignored
                                                                                 144]]

CPU times: user 283 ms, sys: 90.3 ms, total: 374 ms
Wall time: 1min 7s

Out[27]:

	Coverage@5	HitRate@1	HitRate@5	MAP@5	NDCG@5
SLIM_optimized	0.147598	0.240	0.570	0.095547	0.168684
SLIM	0.160550	0.242	0.558	0.093720	0.165643
ALS	0.195359	0.216	0.540	0.091600	0.160843
KNN	0.052348	0.144	0.384	0.054447	0.101923

6 Compare RePlay models with others¶

To easily evaluate recommendations obtained from other sources, read and pass these recommendations to Experiment

In [29]:

metrics.add_result("my_model", recs)
metrics.results.sort_values("NDCG@5", ascending=False)

Out[29]:

	Coverage@5	HitRate@1	HitRate@5	MAP@5	NDCG@5
SLIM_optimized	0.147598	0.240	0.570	0.095547	0.168684
SLIM	0.160550	0.242	0.558	0.093720	0.165643
ALS	0.195359	0.216	0.540	0.091600	0.160843
KNN	0.052348	0.144	0.384	0.054447	0.101923
my_model	0.052348	0.144	0.384	0.054447	0.101923