import graphlab
graphlab.canvas.set_target("ipynb")
rating_sf = graphlab.SFrame('ratings')
users = graphlab.SFrame('users')
items = graphlab.SFrame('items')
A newer version of GraphLab Create (v1.9) is available! Your current version is v1.8.5. You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.
This non-commercial license of GraphLab Create is assigned to wangchengjun@nju.edu.cn and will expire on July 31, 2016. For commercial licensing options, visit https://dato.com/buy/.
2016-05-18 00:58:46,901 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: /tmp/graphlab_server_1463504318.log
rating_sf.show()
dir(graphlab.recommender)
['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'create', 'factorization_recommender', 'item_similarity_recommender', 'popularity_recommender', 'ranking_factorization_recommender', 'util']
(train, test) = graphlab.recommender.util.random_split_by_user(rating_sf, 'user_id', 'movie_id')
from graphlab import item_similarity_recommender
itemcf = item_similarity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = item_similarity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218621 observations with 6012 users and 3224 items.
Data prepared in: 0.195331s
Computing item similarity statistics:
Computing most similar items for 3224 items:
+-----------------+-----------------+
| Number of items | Elapsed Time |
+-----------------+-----------------+
| 1000 | 1.08126 |
| 2000 | 1.10032 |
| 3000 | 1.12497 |
+-----------------+-----------------+
Finished training in 1.20171s
pop = graphlab.popularity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = popularity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218621 observations with 6012 users and 3224 items.
Data prepared in: 0.237904s
218621 observations to process; with 3224 unique items.
m = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating')
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.907655s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | 1.80479 |
| 4 | 0.130208 | 1.8322 |
| 5 | 0.0651042 | 1.8873 |
| 6 | 0.0325521 | 1.88706 |
+---------+-------------------+------------------------------------------+
| Final | 0.260417 | 1.80479 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 101us | 2.4462 | 1.11698 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.43s | DIVERGED | DIVERGED | 0.260417 |
| RESET | 1.91s | 2.44619 | 1.11697 | |
| 1 | 3.24s | DIVERGED | DIVERGED | 0.130208 |
| RESET | 3.75s | 2.44619 | 1.11697 | |
| 1 | 4.84s | 2.10443 | 1.14093 | 0.0651042 |
| 2 | 5.81s | 1.82027 | 1.04353 | 0.0651042 |
| 3 | 6.89s | 1.75645 | 1.02196 | 0.0651042 |
| 4 | 7.90s | 1.7206 | 1.01294 | 0.0651042 |
| 5 | 8.99s | 1.69207 | 1.00488 | 0.0651042 |
| 6 | 10.03s | 1.66916 | 0.998471 | 0.0651042 |
| 7 | 11.11s | 1.64975 | 0.992687 | 0.0651042 |
| 8 | 12.27s | 1.63331 | 0.987803 | 0.0651042 |
| 9 | 13.53s | 1.6203 | 0.984347 | 0.0651042 |
| 10 | 14.68s | 1.60869 | 0.981751 | 0.0651042 |
| 11 | 15.79s | 1.59758 | 0.977906 | 0.0651042 |
| 12 | 16.82s | 1.58984 | 0.976171 | 0.0651042 |
| 13 | 17.96s | 1.58036 | 0.973489 | 0.0651042 |
| 14 | 19.24s | 1.57243 | 0.971477 | 0.0651042 |
| 15 | 20.37s | 1.56503 | 0.969302 | 0.0651042 |
| 16 | 21.50s | 1.55807 | 0.967444 | 0.0651042 |
| 17 | 22.63s | 1.55118 | 0.965764 | 0.0651042 |
| 18 | 23.79s | 1.54509 | 0.963793 | 0.0651042 |
| 19 | 24.93s | 1.53942 | 0.961991 | 0.0651042 |
| 20 | 26.23s | 1.53433 | 0.960398 | 0.0651042 |
| 21 | 27.37s | 1.52844 | 0.959103 | 0.0651042 |
| 22 | 28.46s | 1.52382 | 0.958025 | 0.0651042 |
| 23 | 29.55s | 1.51829 | 0.956181 | 0.0651042 |
| 24 | 30.81s | 1.51352 | 0.955045 | 0.0651042 |
| 25 | 32.13s | 1.50902 | 0.953533 | 0.0651042 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.53876
Final training RMSE: 0.948071
m
Class : RankingFactorizationRecommender Schema ------ User ID : user_id Item ID : movie_id Target : rating Additional observation features : 1 Number of user side features : 0 Number of item side features : 0 Statistics ---------- Number of observations : 965508 Number of users : 6040 Number of items : 3706 Training summary ---------------- Training time : 36.9965 Model Parameters ---------------- Model class : RankingFactorizationRecommender num_factors : 32 binary_target : 0 side_data_factorization : 1 solver : auto nmf : 0 max_iterations : 25 Regularization Settings ----------------------- regularization : 0.0 regularization_type : normal linear_regularization : 0.0 ranking_regularization : 0.25 unobserved_rating_value : -1.79769313486e+308 num_sampled_negative_examples : 4 ials_confidence_scaling_type : auto ials_confidence_scaling_factor : 1 Optimization Settings --------------------- init_random_sigma : 0.01 sgd_convergence_interval : 4 sgd_convergence_threshold : 0.0 sgd_max_trial_iterations : 5 sgd_sampling_block_size : 131072 sgd_step_adjustment_interval : 4 sgd_step_size : 0.0 sgd_trial_sample_minimum_size : 10000 sgd_trial_sample_proportion : 0.125 step_size_decrease_rate : 0.75 additional_iterations_if_unhealthy: 5 adagrad_momentum_weighting : 0.9 num_tempering_iterations : 4 tempering_regularization_start_value: 0.0 track_exact_loss : 0
m['coefficients']
{'intercept': 3.5821495005738013, 'movie_id': Columns: movie_id int linear_terms float factors array Rows: 3706 Data: +----------+------------------+-------------------------------+ | movie_id | linear_terms | factors | +----------+------------------+-------------------------------+ | 1193 | 1.06781125069 | [-0.119829073548, -0.02245... | | 661 | -0.0261590108275 | [-0.727257788181, 0.016146... | | 914 | 0.324085891247 | [-0.859803378582, 0.056376... | | 3408 | 0.565778970718 | [0.334619760513, -0.014206... | | 2355 | 0.648248255253 | [-0.248598009348, 0.103843... | | 1197 | 1.12024652958 | [-0.100379563868, 0.085359... | | 1287 | 0.345532894135 | [-0.247123196721, 0.024613... | | 2804 | 0.894821941853 | [-0.272583067417, 0.046351... | | 594 | 0.311594575644 | [-0.974369823933, 0.054282... | | 919 | 0.97704321146 | [-0.598346889019, 0.085630... | +----------+------------------+-------------------------------+ [3706 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'side_data': Columns: feature str index str linear_terms float factors array Rows: 1 Data: +-----------+-------+-----------------+-------------------------------+ | feature | index | linear_terms | factors | +-----------+-------+-----------------+-------------------------------+ | timestamp | 0 | -0.116745471954 | [-0.564183712006, 1.267165... | +-----------+-------+-----------------+-------------------------------+ [1 rows x 4 columns], 'user_id': Columns: user_id int linear_terms float factors array Rows: 6040 Data: +---------+------------------+-------------------------------+ | user_id | linear_terms | factors | +---------+------------------+-------------------------------+ | 1 | -0.027785371989 | [-0.0942558199167, 0.00739... | | 2 | -0.0234720371664 | [0.015922004357, -0.033992... | | 3 | -0.0345229320228 | [0.176564618945, -0.050576... | | 4 | -0.0198582224548 | [-0.0773911848664, -0.0500... | | 5 | -0.0562275871634 | [-0.0598151274025, -0.0059... | | 6 | -0.0401206016541 | [0.0565584115684, 0.030123... | | 7 | -0.0433877147734 | [0.205288589001, -0.060017... | | 8 | -0.0184100158513 | [0.169030055404, -0.043373... | | 9 | -0.0512112490833 | [0.163330376148, -0.060946... | | 10 | -0.0407416447997 | [-0.420519113541, 0.110337... | +---------+------------------+-------------------------------+ [6040 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[pop, itemcf, m],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+-----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+-----------------+-----------------+ | 1 | 0.109289617486 | 0.0154169412131 | | 2 | 0.114754098361 | 0.0315571827129 | | 3 | 0.103825136612 | 0.0393550677194 | | 4 | 0.0983606557377 | 0.0488860172488 | | 5 | 0.0983606557377 | 0.057530354299 | | 6 | 0.0983606557377 | 0.06952814808 | | 7 | 0.0967993754879 | 0.0776744105871 | | 8 | 0.0949453551913 | 0.0871933441083 | | 9 | 0.0910746812386 | 0.0970583805009 | | 10 | 0.0890710382514 | 0.105731522781 | +--------+-----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.218579234973 | 0.0336356996991 | | 2 | 0.185792349727 | 0.0491081612808 | | 3 | 0.182149362477 | 0.0721856847862 | | 4 | 0.172131147541 | 0.086814432767 | | 5 | 0.165027322404 | 0.0983099175722 | | 6 | 0.152094717668 | 0.110996252299 | | 7 | 0.148321623731 | 0.13067735829 | | 8 | 0.148907103825 | 0.150453968213 | | 9 | 0.142076502732 | 0.158699171088 | | 10 | 0.134426229508 | 0.166857542043 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.27868852459 | 0.0355923139267 | | 2 | 0.226775956284 | 0.0540712203094 | | 3 | 0.213114754098 | 0.0716753913564 | | 4 | 0.198087431694 | 0.0898091945474 | | 5 | 0.183606557377 | 0.100699809919 | | 6 | 0.182149362477 | 0.11362028645 | | 7 | 0.185011709602 | 0.137198290932 | | 8 | 0.177595628415 | 0.147966304582 | | 9 | 0.169398907104 | 0.156916738229 | | 10 | 0.160655737705 | 0.171367047623 | +--------+----------------+-----------------+ [10 rows x 3 columns]
[{'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3294 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 42 | 1 | 0.0 | 0.0 | 7 | | 42 | 2 | 0.5 | 0.142857142857 | 7 | | 42 | 3 | 0.333333333333 | 0.142857142857 | 7 | | 42 | 4 | 0.25 | 0.142857142857 | 7 | | 42 | 5 | 0.2 | 0.142857142857 | 7 | | 42 | 6 | 0.166666666667 | 0.142857142857 | 7 | | 42 | 7 | 0.142857142857 | 0.142857142857 | 7 | | 42 | 8 | 0.125 | 0.142857142857 | 7 | | 42 | 9 | 0.111111111111 | 0.142857142857 | 7 | | 42 | 10 | 0.1 | 0.142857142857 | 7 | +---------+--------+----------------+----------------+-------+ [3294 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+-----------------+-----------------+ | cutoff | precision | recall | +--------+-----------------+-----------------+ | 1 | 0.109289617486 | 0.0154169412131 | | 2 | 0.114754098361 | 0.0315571827129 | | 3 | 0.103825136612 | 0.0393550677194 | | 4 | 0.0983606557377 | 0.0488860172488 | | 5 | 0.0983606557377 | 0.057530354299 | | 6 | 0.0983606557377 | 0.06952814808 | | 7 | 0.0967993754879 | 0.0776744105871 | | 8 | 0.0949453551913 | 0.0871933441083 | | 9 | 0.0910746812386 | 0.0970583805009 | | 10 | 0.0890710382514 | 0.105731522781 | +--------+-----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3294 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 42 | 1 | 1.0 | 0.142857142857 | 7 | | 42 | 2 | 0.5 | 0.142857142857 | 7 | | 42 | 3 | 0.333333333333 | 0.142857142857 | 7 | | 42 | 4 | 0.25 | 0.142857142857 | 7 | | 42 | 5 | 0.4 | 0.285714285714 | 7 | | 42 | 6 | 0.333333333333 | 0.285714285714 | 7 | | 42 | 7 | 0.285714285714 | 0.285714285714 | 7 | | 42 | 8 | 0.25 | 0.285714285714 | 7 | | 42 | 9 | 0.222222222222 | 0.285714285714 | 7 | | 42 | 10 | 0.2 | 0.285714285714 | 7 | +---------+--------+----------------+----------------+-------+ [3294 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.218579234973 | 0.0336356996991 | | 2 | 0.185792349727 | 0.0491081612808 | | 3 | 0.182149362477 | 0.0721856847862 | | 4 | 0.172131147541 | 0.086814432767 | | 5 | 0.165027322404 | 0.0983099175722 | | 6 | 0.152094717668 | 0.110996252299 | | 7 | 0.148321623731 | 0.13067735829 | | 8 | 0.148907103825 | 0.150453968213 | | 9 | 0.142076502732 | 0.158699171088 | | 10 | 0.134426229508 | 0.166857542043 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3294 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 42 | 1 | 1.0 | 0.142857142857 | 7 | | 42 | 2 | 0.5 | 0.142857142857 | 7 | | 42 | 3 | 0.333333333333 | 0.142857142857 | 7 | | 42 | 4 | 0.25 | 0.142857142857 | 7 | | 42 | 5 | 0.2 | 0.142857142857 | 7 | | 42 | 6 | 0.166666666667 | 0.142857142857 | 7 | | 42 | 7 | 0.142857142857 | 0.142857142857 | 7 | | 42 | 8 | 0.125 | 0.142857142857 | 7 | | 42 | 9 | 0.111111111111 | 0.142857142857 | 7 | | 42 | 10 | 0.2 | 0.285714285714 | 7 | +---------+--------+----------------+----------------+-------+ [3294 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.27868852459 | 0.0355923139267 | | 2 | 0.226775956284 | 0.0540712203094 | | 3 | 0.213114754098 | 0.0716753913564 | | 4 | 0.198087431694 | 0.0898091945474 | | 5 | 0.183606557377 | 0.100699809919 | | 6 | 0.182149362477 | 0.11362028645 | | 7 | 0.185011709602 | 0.137198290932 | | 8 | 0.177595628415 | 0.147966304582 | | 9 | 0.169398907104 | 0.156916738229 | | 10 | 0.160655737705 | 0.171367047623 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
m_rank = graphlab.recommender.ranking_factorization_recommender.create(train, 'user_id', 'movie_id', 'rating',
unobserved_rating_value=3)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.910656s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| unobserved_rating_value | Ranking Target Rating for Unobserved Interacti...| 3 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | Not Viable |
| 4 | 0.0651042 | 0.998804 |
| 5 | 0.0325521 | 0.953073 |
| 6 | 0.016276 | 1.00856 |
| 7 | 0.00813802 | 1.0661 |
| 8 | 0.00406901 | 1.23488 |
+---------+-------------------+------------------------------------------+
| Final | 0.0325521 | 0.953073 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 107us | 1.33247 | 1.11699 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 885.253ms | 1.10069 | 1.00012 | 0.0325521 |
| 2 | 1.90s | 1.01103 | 0.956979 | 0.0325521 |
| 3 | 3.00s | 0.974312 | 0.937847 | 0.0325521 |
| 4 | 4.07s | 0.960712 | 0.931004 | 0.0325521 |
| 5 | 5.12s | 0.949761 | 0.9254 | 0.0325521 |
| 6 | 6.07s | 0.942225 | 0.921526 | 0.0325521 |
| 7 | 7.06s | 0.935704 | 0.918205 | 0.0325521 |
| 8 | 8.02s | 0.930567 | 0.915684 | 0.0325521 |
| 9 | 9.05s | 0.925405 | 0.912967 | 0.0325521 |
| 10 | 10.33s | 0.920952 | 0.910727 | 0.0325521 |
| 11 | 11.58s | 0.916647 | 0.908743 | 0.0325521 |
| 12 | 12.62s | 0.913399 | 0.907017 | 0.0325521 |
| 13 | 13.58s | 0.909575 | 0.904969 | 0.0325521 |
| 14 | 14.67s | 0.906824 | 0.90367 | 0.0325521 |
| 15 | 15.77s | 0.904054 | 0.902198 | 0.0325521 |
| 16 | 16.86s | 0.901294 | 0.90096 | 0.0325521 |
| 17 | 17.85s | 0.898579 | 0.899525 | 0.0325521 |
| 18 | 18.86s | 0.896474 | 0.898482 | 0.0325521 |
| 19 | 20.07s | 0.894312 | 0.897331 | 0.0325521 |
| 20 | 21.10s | 0.892068 | 0.896046 | 0.0325521 |
| 21 | 22.12s | 0.88988 | 0.894963 | 0.0325521 |
| 22 | 23.33s | 0.887669 | 0.893956 | 0.0325521 |
| 23 | 24.51s | 0.885674 | 0.892851 | 0.0325521 |
| 24 | 25.58s | 0.884228 | 0.892176 | 0.0325521 |
| 25 | 26.59s | 0.882557 | 0.891299 | 0.0325521 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 0.882406
Final training RMSE: 0.886832
results = graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[pop, itemcf, m, m_rank],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+-----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+-----------------+-----------------+ | 1 | 0.103825136612 | 0.0116730652166 | | 2 | 0.101092896175 | 0.0282821265133 | | 3 | 0.0819672131148 | 0.0387227556041 | | 4 | 0.0833333333333 | 0.054260741295 | | 5 | 0.0786885245902 | 0.0652009814042 | | 6 | 0.0765027322404 | 0.0758533021658 | | 7 | 0.0772833723653 | 0.087870157321 | | 8 | 0.0785519125683 | 0.10363872715 | | 9 | 0.0740740740741 | 0.108125993351 | | 10 | 0.0743169398907 | 0.125169969412 | +--------+-----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.16393442623 | 0.033918559494 | | 2 | 0.166666666667 | 0.0559035280067 | | 3 | 0.158469945355 | 0.0802610557096 | | 4 | 0.147540983607 | 0.0993935937697 | | 5 | 0.138797814208 | 0.116006405262 | | 6 | 0.128415300546 | 0.125824450712 | | 7 | 0.128805620609 | 0.148368313836 | | 8 | 0.125 | 0.162294248876 | | 9 | 0.12204007286 | 0.173015991344 | | 10 | 0.117486338798 | 0.18606052953 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.24043715847 | 0.032990686752 | | 2 | 0.196721311475 | 0.0593716586723 | | 3 | 0.187613843352 | 0.0783908312908 | | 4 | 0.180327868852 | 0.102234190817 | | 5 | 0.16393442623 | 0.120610724355 | | 6 | 0.159380692168 | 0.140119556509 | | 7 | 0.152224824356 | 0.157806327365 | | 8 | 0.142759562842 | 0.166809497193 | | 9 | 0.137826350941 | 0.175280850522 | | 10 | 0.134972677596 | 0.186140806231 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M3 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.114754098361 | 0.0220448212961 | | 2 | 0.120218579235 | 0.0351999242985 | | 3 | 0.111111111111 | 0.0445415535372 | | 4 | 0.106557377049 | 0.052975921608 | | 5 | 0.110382513661 | 0.076614241225 | | 6 | 0.111111111111 | 0.0956858236539 | | 7 | 0.110850897736 | 0.113159656127 | | 8 | 0.106557377049 | 0.121421484491 | | 9 | 0.102610807529 | 0.136444329342 | | 10 | 0.101092896175 | 0.14669712098 | +--------+----------------+-----------------+ [10 rows x 3 columns]
results[3]['precision_recall_overall']
cutoff | precision | recall |
---|---|---|
1 | 0.114754098361 | 0.0220448212961 |
2 | 0.120218579235 | 0.0351999242985 |
3 | 0.111111111111 | 0.0445415535372 |
4 | 0.106557377049 | 0.052975921608 |
5 | 0.110382513661 | 0.076614241225 |
6 | 0.111111111111 | 0.0956858236539 |
7 | 0.110850897736 | 0.113159656127 |
8 | 0.106557377049 | 0.121421484491 |
9 | 0.102610807529 | 0.136444329342 |
10 | 0.101092896175 | 0.14669712098 |
user_sf = graphlab.SFrame('users')
item_sf = graphlab.SFrame('items')
m_user = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf)
m_item = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
item_data=item_sf)
m_both = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf, item_data=item_sf)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.872319s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 7.14286 | Not Viable |
| 1 | 1.78571 | Not Viable |
| 2 | 0.446429 | 1.52387 |
| 3 | 0.223214 | Not Viable |
| 4 | 0.0558036 | 1.79945 |
| 5 | 0.0279018 | 1.78969 |
+---------+-------------------+------------------------------------------+
| Final | 0.446429 | 1.52387 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 63us | 2.44592 | 1.11697 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 2.13s | DIVERGED | DIVERGED | 0.446429 |
| RESET | 2.99s | 2.44647 | 1.11698 | |
| 1 | 5.10s | 1.83529 | 1.10141 | 0.223214 |
| 2 | 6.90s | 1.47109 | 0.95455 | 0.223214 |
| 3 | 8.80s | 1.36577 | 0.917865 | 0.223214 |
| 4 | 10.62s | 1.31007 | 0.897384 | 0.223214 |
| 5 | 12.44s | 1.27125 | 0.883491 | 0.223214 |
| 6 | 14.67s | 1.24635 | 0.873351 | 0.223214 |
| 7 | 16.70s | 1.22622 | 0.865748 | 0.223214 |
| 8 | 18.63s | 1.21028 | 0.859754 | 0.223214 |
| 9 | 20.50s | 1.19796 | 0.854523 | 0.223214 |
| 10 | 22.46s | 1.18689 | 0.850478 | 0.223214 |
| 11 | 24.31s | 1.1783 | 0.846785 | 0.223214 |
| 12 | 26.18s | 1.17084 | 0.84352 | 0.223214 |
| 13 | 28.07s | 1.16355 | 0.840745 | 0.223214 |
| 14 | 29.97s | 1.15711 | 0.838398 | 0.223214 |
| 15 | 32.09s | 1.15247 | 0.836416 | 0.223214 |
| 16 | 34.04s | 1.14785 | 0.83443 | 0.223214 |
| 17 | 36.25s | 1.14331 | 0.832546 | 0.223214 |
| 18 | 38.41s | 1.13848 | 0.830724 | 0.223214 |
| 19 | 40.34s | 1.13683 | 0.82959 | 0.223214 |
| 20 | 42.39s | 1.13266 | 0.828052 | 0.223214 |
| 21 | 44.75s | 1.13008 | 0.827049 | 0.223214 |
| 22 | 47.05s | 1.12695 | 0.82589 | 0.223214 |
| 23 | 49.58s | 1.12374 | 0.824564 | 0.223214 |
| 24 | 51.93s | 1.12202 | 0.823962 | 0.223214 |
| 25 | 54.30s | 1.12015 | 0.822855 | 0.223214 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.12099
Final training RMSE: 0.797943
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3883 items.
Data prepared in: 1.16364s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 10 | Not Viable |
| 1 | 2.5 | Not Viable |
| 2 | 0.625 | Not Viable |
| 3 | 0.15625 | 1.08443 |
| 4 | 0.078125 | 1.72974 |
| 5 | 0.0390625 | 1.84654 |
| 6 | 0.0195312 | 1.74856 |
+---------+-------------------+------------------------------------------+
| Final | 0.15625 | 1.08443 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us | 2.44637 | 1.11698 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 2.33s | DIVERGED | DIVERGED | 0.15625 |
| RESET | 3.11s | 2.44644 | 1.11697 | |
| 1 | 4.92s | 1.75622 | 1.06161 | 0.078125 |
| 2 | 6.57s | 1.50614 | 0.966049 | 0.078125 |
| 3 | 8.08s | 1.4167 | 0.933697 | 0.078125 |
| 4 | 9.53s | 1.37077 | 0.917417 | 0.078125 |
| 5 | 10.99s | 1.34379 | 0.908108 | 0.078125 |
| 6 | 12.65s | 1.32151 | 0.899986 | 0.078125 |
| 7 | 14.14s | 1.30427 | 0.89374 | 0.078125 |
| 8 | 15.59s | 1.2895 | 0.888268 | 0.078125 |
| 9 | 17.03s | 1.27727 | 0.884087 | 0.078125 |
| 10 | 18.47s | 1.2662 | 0.879697 | 0.078125 |
| 11 | 19.91s | 1.25785 | 0.876548 | 0.078125 |
| 12 | 21.33s | 1.24893 | 0.873322 | 0.078125 |
| 13 | 22.73s | 1.24222 | 0.870908 | 0.078125 |
| 14 | 24.17s | 1.23693 | 0.868724 | 0.078125 |
| 15 | 25.64s | 1.23104 | 0.866697 | 0.078125 |
| 16 | 27.07s | 1.22657 | 0.865066 | 0.078125 |
| 17 | 28.47s | 1.22185 | 0.86311 | 0.078125 |
| 18 | 29.90s | 1.21603 | 0.860832 | 0.078125 |
| 19 | 31.34s | 1.21214 | 0.859841 | 0.078125 |
| 20 | 32.75s | 1.20866 | 0.858349 | 0.078125 |
| 21 | 34.17s | 1.20588 | 0.857265 | 0.078125 |
| 22 | 35.59s | 1.2013 | 0.855384 | 0.078125 |
| 23 | 36.98s | 1.19868 | 0.854415 | 0.078125 |
| 24 | 38.40s | 1.19618 | 0.8536 | 0.078125 |
| 25 | 39.82s | 1.19373 | 0.852524 | 0.078125 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.22299
Final training RMSE: 0.842974
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3883 items.
Data prepared in: 0.897359s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 5.55556 | Not Viable |
| 1 | 1.38889 | Not Viable |
| 2 | 0.347222 | 1.41163 |
| 3 | 0.173611 | 1.06475 |
| 4 | 0.0868056 | 1.03146 |
| 5 | 0.0434028 | 1.1885 |
| 6 | 0.0217014 | 1.54004 |
| 7 | 0.0108507 | 1.77557 |
+---------+-------------------+------------------------------------------+
| Final | 0.0868056 | 1.03146 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us | 2.4467 | 1.117 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 3.02s | DIVERGED | DIVERGED | 0.0868056 |
| RESET | 3.97s | 2.4468 | 1.11698 | |
| 1 | 6.22s | 1.56747 | 1.00211 | 0.0434028 |
| 2 | 8.40s | 1.4123 | 0.936364 | 0.0434028 |
| 3 | 10.61s | 1.34465 | 0.91251 | 0.0434028 |
| 4 | 13.00s | 1.30528 | 0.898029 | 0.0434028 |
| 5 | 15.21s | 1.27777 | 0.887646 | 0.0434028 |
| 6 | 18.09s | 1.25523 | 0.879139 | 0.0434028 |
| 7 | 20.54s | 1.23908 | 0.872961 | 0.0434028 |
| 8 | 22.79s | 1.22569 | 0.867368 | 0.0434028 |
| 9 | 25.25s | 1.21337 | 0.862705 | 0.0434028 |
| 10 | 28.35s | 1.20338 | 0.858906 | 0.0434028 |
| 11 | 31.26s | 1.19501 | 0.855631 | 0.0434028 |
| 12 | 33.70s | 1.18678 | 0.852674 | 0.0434028 |
| 13 | 35.86s | 1.18128 | 0.850198 | 0.0434028 |
| 14 | 38.05s | 1.17569 | 0.847772 | 0.0434028 |
| 15 | 40.25s | 1.16953 | 0.845739 | 0.0434028 |
| 16 | 42.45s | 1.16479 | 0.843867 | 0.0434028 |
| 17 | 44.62s | 1.16132 | 0.842166 | 0.0434028 |
| 18 | 46.80s | 1.15646 | 0.840656 | 0.0434028 |
| 19 | 48.95s | 1.15327 | 0.838994 | 0.0434028 |
| 20 | 51.21s | 1.15068 | 0.837869 | 0.0434028 |
| 21 | 53.40s | 1.14713 | 0.836618 | 0.0434028 |
| 22 | 55.60s | 1.14393 | 0.835341 | 0.0434028 |
| 23 | 57.78s | 1.14105 | 0.834355 | 0.0434028 |
| 24 | 59.92s | 1.13905 | 0.833327 | 0.0434028 |
| 25 | 1m 2s | 1.13617 | 0.832308 | 0.0434028 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.16266
Final training RMSE: 0.822791
m_both
Class : RankingFactorizationRecommender Schema ------ User ID : user_id Item ID : movie_id Target : rating Additional observation features : 1 Number of user side features : 5 Number of item side features : 3 Statistics ---------- Number of observations : 965508 Number of users : 6040 Number of items : 3883 Training summary ---------------- Training time : 74.1749 Model Parameters ---------------- Model class : RankingFactorizationRecommender num_factors : 32 binary_target : 0 side_data_factorization : 1 solver : auto nmf : 0 max_iterations : 25 Regularization Settings ----------------------- regularization : 0.0 regularization_type : normal linear_regularization : 0.0 ranking_regularization : 0.25 unobserved_rating_value : -1.79769313486e+308 num_sampled_negative_examples : 4 ials_confidence_scaling_type : auto ials_confidence_scaling_factor : 1 Optimization Settings --------------------- init_random_sigma : 0.01 sgd_convergence_interval : 4 sgd_convergence_threshold : 0.0 sgd_max_trial_iterations : 5 sgd_sampling_block_size : 131072 sgd_step_adjustment_interval : 4 sgd_step_size : 0.0 sgd_trial_sample_minimum_size : 10000 sgd_trial_sample_proportion : 0.125 step_size_decrease_rate : 0.75 additional_iterations_if_unhealthy: 5 adagrad_momentum_weighting : 0.9 num_tempering_iterations : 4 tempering_regularization_start_value: 0.0 track_exact_loss : 0
results = graphlab.recommender.util.compare_models(test, [m, m_user, m_item, m_both], user_sample=0.2)
compare_models: using 200 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.305 | 0.0104911692762 | | 2 | 0.3075 | 0.0218598413174 | | 3 | 0.298333333333 | 0.0312188865716 | | 4 | 0.28375 | 0.0416981002021 | | 5 | 0.267 | 0.0490821420076 | | 6 | 0.255 | 0.0566601620839 | | 7 | 0.244285714286 | 0.0634873210823 | | 8 | 0.2325 | 0.0671075588864 | | 9 | 0.227222222222 | 0.0732309783629 | | 10 | 0.22 | 0.0778109177835 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 0.9976486246840836) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 4259 | 4 | 0.482071967797 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 3275 | 4 | 1.92700170805 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+------------------+ | movie_id | count | rmse | +----------+-------+------------------+ | 163 | 1 | 0.00169078452152 | +----------+-------+------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 3196 | 1 | 4.01836458102 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.375 | 0.0136968745991 | | 2 | 0.3575 | 0.023755002765 | | 3 | 0.335 | 0.0362605125879 | | 4 | 0.3075 | 0.0423008826942 | | 5 | 0.293 | 0.0497532350863 | | 6 | 0.281666666667 | 0.0578575934232 | | 7 | 0.282142857143 | 0.0671244350417 | | 8 | 0.27125 | 0.0748468085545 | | 9 | 0.262222222222 | 0.0813855604447 | | 10 | 0.26 | 0.0896391608922 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 1.0433027431346846) Per User RMSE (best) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 4259 | 4 | 0.24496854802 | +---------+-------+---------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 2912 | 7 | 2.13582589594 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+------------------+ | movie_id | count | rmse | +----------+-------+------------------+ | 379 | 1 | 0.00331903448671 | +----------+-------+------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 3117 | 1 | 3.61265547345 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.355 | 0.01355116837 | | 2 | 0.345 | 0.0247731072461 | | 3 | 0.328333333333 | 0.0346428767624 | | 4 | 0.3 | 0.0417405134962 | | 5 | 0.286 | 0.0496330466776 | | 6 | 0.283333333333 | 0.0607078786391 | | 7 | 0.273571428571 | 0.068217750301 | | 8 | 0.265625 | 0.0745364058279 | | 9 | 0.262777777778 | 0.0821126915135 | | 10 | 0.255 | 0.0868626759608 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 1.0121070979184972) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 4259 | 4 | 0.280887940151 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 2912 | 7 | 2.21734756181 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+-------------------+ | movie_id | count | rmse | +----------+-------+-------------------+ | 1283 | 1 | 0.000539099143753 | +----------+-------+-------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 1806 | 1 | 3.39939804512 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M3 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.4 | 0.0173755798355 | | 2 | 0.3625 | 0.0265019144168 | | 3 | 0.338333333333 | 0.0380391678111 | | 4 | 0.3275 | 0.0479764485634 | | 5 | 0.312 | 0.0576493349033 | | 6 | 0.299166666667 | 0.0654828586353 | | 7 | 0.295714285714 | 0.0760004175285 | | 8 | 0.29 | 0.0841766126368 | | 9 | 0.286666666667 | 0.093854767667 | | 10 | 0.2755 | 0.0995420682121 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 0.9936036664127302) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 4259 | 4 | 0.397063536039 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 2912 | 7 | 2.01561012851 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+------------------+ | movie_id | count | rmse | +----------+-------+------------------+ | 849 | 1 | 0.00179710693082 | +----------+-------+------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 3806 | 1 | 3.81246875723 | +----------+-------+---------------+ [1 rows x 3 columns]
[results[i]['rmse_overall'] for i in range(len(results))]
[0.9976486246840836, 1.0433027431346846, 1.0121070979184972, 0.9936036664127302]
results[0]['rmse_by_item'].show()
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[m_rank, m_both],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.158469945355 | 0.0235039678346 | | 2 | 0.155737704918 | 0.0367981705878 | | 3 | 0.136612021858 | 0.0436527349676 | | 4 | 0.137978142077 | 0.0577237401993 | | 5 | 0.136612021858 | 0.0723710331936 | | 6 | 0.125683060109 | 0.0842016591703 | | 7 | 0.124902419984 | 0.101563633319 | | 8 | 0.118852459016 | 0.109983711543 | | 9 | 0.114754098361 | 0.11908895883 | | 10 | 0.111475409836 | 0.128536481539 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.245901639344 | 0.0538736282089 | | 2 | 0.234972677596 | 0.0758295229523 | | 3 | 0.23679417122 | 0.101520870674 | | 4 | 0.225409836066 | 0.12569838792 | | 5 | 0.209836065574 | 0.142998857028 | | 6 | 0.192167577413 | 0.155246605969 | | 7 | 0.183450429352 | 0.174214946795 | | 8 | 0.176229508197 | 0.190227369104 | | 9 | 0.165148755313 | 0.200596309693 | | 10 | 0.158469945355 | 0.210908575339 | +--------+----------------+-----------------+ [10 rows x 3 columns]
[{'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3294 Data: +---------+--------+-----------+--------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+-----------+--------+-------+ | 3 | 1 | 0.0 | 0.0 | 2 | | 3 | 2 | 0.0 | 0.0 | 2 | | 3 | 3 | 0.0 | 0.0 | 2 | | 3 | 4 | 0.0 | 0.0 | 2 | | 3 | 5 | 0.0 | 0.0 | 2 | | 3 | 6 | 0.0 | 0.0 | 2 | | 3 | 7 | 0.0 | 0.0 | 2 | | 3 | 8 | 0.0 | 0.0 | 2 | | 3 | 9 | 0.0 | 0.0 | 2 | | 3 | 10 | 0.0 | 0.0 | 2 | +---------+--------+-----------+--------+-------+ [3294 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.158469945355 | 0.0235039678346 | | 2 | 0.155737704918 | 0.0367981705878 | | 3 | 0.136612021858 | 0.0436527349676 | | 4 | 0.137978142077 | 0.0577237401993 | | 5 | 0.136612021858 | 0.0723710331936 | | 6 | 0.125683060109 | 0.0842016591703 | | 7 | 0.124902419984 | 0.101563633319 | | 8 | 0.118852459016 | 0.109983711543 | | 9 | 0.114754098361 | 0.11908895883 | | 10 | 0.111475409836 | 0.128536481539 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3294 Data: +---------+--------+-----------+--------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+-----------+--------+-------+ | 3 | 1 | 0.0 | 0.0 | 2 | | 3 | 2 | 0.0 | 0.0 | 2 | | 3 | 3 | 0.0 | 0.0 | 2 | | 3 | 4 | 0.0 | 0.0 | 2 | | 3 | 5 | 0.0 | 0.0 | 2 | | 3 | 6 | 0.0 | 0.0 | 2 | | 3 | 7 | 0.0 | 0.0 | 2 | | 3 | 8 | 0.0 | 0.0 | 2 | | 3 | 9 | 0.0 | 0.0 | 2 | | 3 | 10 | 0.0 | 0.0 | 2 | +---------+--------+-----------+--------+-------+ [3294 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.245901639344 | 0.0538736282089 | | 2 | 0.234972677596 | 0.0758295229523 | | 3 | 0.23679417122 | 0.101520870674 | | 4 | 0.225409836066 | 0.12569838792 | | 5 | 0.209836065574 | 0.142998857028 | | 6 | 0.192167577413 | 0.155246605969 | | 7 | 0.183450429352 | 0.174214946795 | | 8 | 0.176229508197 | 0.190227369104 | | 9 | 0.165148755313 | 0.200596309693 | | 10 | 0.158469945355 | 0.210908575339 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
fm = graphlab.recommender.create(train.head(10000), 'user_id', 'movie_id', 'rating',
method='factorization_model',
item_data=item_sf,
sgd_step_size=0.09,
max_iterations=10)