使用Turicreate进行音乐推荐



In [2]:
import turicreate as tc
In [3]:
#train_file = 'http://s3.amazonaws.com/dato-datasets/millionsong/10000.txt'
train_file = '/Users/datalab/bigdata/cjc/millionsong/song_usage_10000.txt'
sf = tc.SFrame.read_csv(train_file, header=False, delimiter='\t', verbose=False)
sf = sf.rename({'X1':'user_id', 'X2':'music_id', 'X3':'rating'})
In [4]:
train_set, test_set = sf.random_split(0.8, seed=1)
In [5]:
popularity_model = tc.popularity_recommender.create(train_set, 
                                                    'user_id', 'music_id', 
                                                    target = 'rating')
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 0.907738s
1599753 observations to process; with 10000 unique items.
In [7]:
item_sim_model = tc.item_similarity_recommender.create(train_set, 
                                                       'user_id', 'music_id', 
                                                       target = 'rating', 
                                                       similarity_type='cosine')
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 0.939059s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 1.878ms                        | 1.25       |
| 29.154ms                       | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 199.684ms                           | 0                | 0               |
| 957.28ms                            | 100              | 10000           |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 2.00431s
In [8]:
factorization_machine_model = tc.recommender.factorization_recommender.create(train_set, 
                                                                              'user_id', 'music_id',
                                                                              target='rating')
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 0.930001s
Training factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 8        |
| regularization                 | L2 Regularization on Factors                     | 1e-08    |
| solver                         | Solver used for training                         | sgd      |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-10    |
| max_iterations                 | Maximum Number of Iterations                     | 50       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 199969 / 1599753 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 25                | No Decrease (224.847 >= 36.2873)         |
| 1       | 6.25              | No Decrease (211.831 >= 36.2873)         |
| 2       | 1.5625            | No Decrease (184.589 >= 36.2873)         |
| 3       | 0.390625          | No Decrease (83.9764 >= 36.2873)         |
| 4       | 0.0976562         | 11.3523                                  |
| 5       | 0.0488281         | 7.5686                                   |
| 6       | 0.0244141         | 21.6581                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.0488281         | 7.5686                                   |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 99us         | 43.795            | 6.61778               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 99.622ms     | 43.5086           | 6.59571               | 0.0488281   |
| 2       | 191.248ms    | 40.9101           | 6.39574               | 0.0290334   |
| 3       | 280.477ms    | 37.8972           | 6.15571               | 0.0214205   |
| 4       | 378.603ms    | 35.2936           | 5.94045               | 0.0172633   |
| 5       | 474.372ms    | 32.7773           | 5.72471               | 0.014603    |
| 10      | 959.686ms    | 24.5984           | 4.95903               | 0.008683    |
| 50      | 5.02s        | 9.19885           | 3.0314                | 0.00154408  |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 8.16243
       Final training RMSE: 2.85534
In [9]:
len(train_set)
Out[9]:
1599753
In [10]:
result = tc.recommender.util.compare_models(test_set, 
                                            [popularity_model, item_sim_model, factorization_machine_model],
                                            user_sample=.5, skip_set=train_set)
compare_models: using 34354 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/34354 queries. users per second: 19077.4
recommendations finished on 2000/34354 queries. users per second: 21071.3
recommendations finished on 3000/34354 queries. users per second: 21793
recommendations finished on 4000/34354 queries. users per second: 22081.5
recommendations finished on 5000/34354 queries. users per second: 22392.1
recommendations finished on 6000/34354 queries. users per second: 22620.3
recommendations finished on 7000/34354 queries. users per second: 22719.2
recommendations finished on 8000/34354 queries. users per second: 22900.3
recommendations finished on 9000/34354 queries. users per second: 23067.3
recommendations finished on 10000/34354 queries. users per second: 22887.2
recommendations finished on 11000/34354 queries. users per second: 22713
recommendations finished on 12000/34354 queries. users per second: 22595.1
recommendations finished on 13000/34354 queries. users per second: 22631.4
recommendations finished on 14000/34354 queries. users per second: 22749.6
recommendations finished on 15000/34354 queries. users per second: 22609.7
recommendations finished on 16000/34354 queries. users per second: 22638.4
recommendations finished on 17000/34354 queries. users per second: 22764.3
recommendations finished on 18000/34354 queries. users per second: 22809.6
recommendations finished on 19000/34354 queries. users per second: 22919.8
recommendations finished on 20000/34354 queries. users per second: 22935.8
recommendations finished on 21000/34354 queries. users per second: 22884.6
recommendations finished on 22000/34354 queries. users per second: 22859.4
recommendations finished on 23000/34354 queries. users per second: 22748.8
recommendations finished on 24000/34354 queries. users per second: 22678.2
recommendations finished on 25000/34354 queries. users per second: 22568.9
recommendations finished on 26000/34354 queries. users per second: 22427.1
recommendations finished on 27000/34354 queries. users per second: 22358.6
recommendations finished on 28000/34354 queries. users per second: 22262.1
recommendations finished on 29000/34354 queries. users per second: 22079
recommendations finished on 30000/34354 queries. users per second: 22018.1
recommendations finished on 31000/34354 queries. users per second: 21629.5
recommendations finished on 32000/34354 queries. users per second: 21548.6
recommendations finished on 33000/34354 queries. users per second: 21540.2
recommendations finished on 34000/34354 queries. users per second: 21499.5
Precision and recall summary statistics by cutoff
+--------+------------------------+------------------------+
| cutoff |      mean_recall       |     mean_precision     |
+--------+------------------------+------------------------+
|   1    | 4.3383582709425655e-05 | 0.00032019561040927034 |
|   2    | 9.351370595195784e-05  | 0.0003493043022646556  |
|   3    | 0.00013332475867271565 | 0.00032989850769439354 |
|   4    | 0.00025157207340778813 | 0.0003638586481923528  |
|   5    | 0.0003743018484379922  | 0.00043663037783082004 |
|   6    | 0.00044921257061878193 | 0.00042207603190312675 |
|   7    | 0.0005172658393736786  | 0.00041583845507697374 |
|   8    |  0.000573334927644493  | 0.0004038830994935098  |
|   9    | 0.0008801308762376106  | 0.0005304250515870728  |
|   10   | 0.0009133251742113642  | 0.0005035803690982172  |
+--------+------------------------+------------------------+
[10 rows x 3 columns]


Overall RMSE: 6.339345574168611

Per User RMSE (best)
+-------------------------------+------+-------+
|            user_id            | rmse | count |
+-------------------------------+------+-------+
| 6d61c9b3678aa6c015ea9fd502... | 0.0  |   1   |
+-------------------------------+------+-------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+-------------------+-------+
|            user_id            |        rmse       | count |
+-------------------------------+-------------------+-------+
| 38767872c514c1b43bab5c7b21... | 341.2071760874715 |   2   |
+-------------------------------+-------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+---------------------+-------+
|      music_id      |         rmse        | count |
+--------------------+---------------------+-------+
| SOXDPFW12A81C2319B | 0.07352941176470584 |   6   |
+--------------------+---------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+--------------------+-------+
|      music_id      |        rmse        | count |
+--------------------+--------------------+-------+
| SOLGIWB12A58A77A05 | 109.15045476689721 |   35  |
+--------------------+--------------------+-------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/34354 queries. users per second: 18663.7
recommendations finished on 2000/34354 queries. users per second: 21804.3
recommendations finished on 3000/34354 queries. users per second: 22501.7
recommendations finished on 4000/34354 queries. users per second: 22946.8
recommendations finished on 5000/34354 queries. users per second: 23156.1
recommendations finished on 6000/34354 queries. users per second: 23361.5
recommendations finished on 7000/34354 queries. users per second: 23118
recommendations finished on 8000/34354 queries. users per second: 23054.1
recommendations finished on 9000/34354 queries. users per second: 22947
recommendations finished on 10000/34354 queries. users per second: 22859.2
recommendations finished on 11000/34354 queries. users per second: 22138.5
recommendations finished on 12000/34354 queries. users per second: 22029.6
recommendations finished on 13000/34354 queries. users per second: 22098.2
recommendations finished on 14000/34354 queries. users per second: 22204
recommendations finished on 15000/34354 queries. users per second: 22239.7
recommendations finished on 16000/34354 queries. users per second: 22323
recommendations finished on 17000/34354 queries. users per second: 22380.5
recommendations finished on 18000/34354 queries. users per second: 22421.7
recommendations finished on 19000/34354 queries. users per second: 22465.5
recommendations finished on 20000/34354 queries. users per second: 22521.1
recommendations finished on 21000/34354 queries. users per second: 22402.5
recommendations finished on 22000/34354 queries. users per second: 22338.5
recommendations finished on 23000/34354 queries. users per second: 21869.3
recommendations finished on 24000/34354 queries. users per second: 21574.6
recommendations finished on 25000/34354 queries. users per second: 21291.7
recommendations finished on 26000/34354 queries. users per second: 20909
recommendations finished on 27000/34354 queries. users per second: 20723.1
recommendations finished on 28000/34354 queries. users per second: 20666.2
recommendations finished on 29000/34354 queries. users per second: 20559.6
recommendations finished on 30000/34354 queries. users per second: 20340.9
recommendations finished on 31000/34354 queries. users per second: 20157.3
recommendations finished on 32000/34354 queries. users per second: 19851.3
recommendations finished on 33000/34354 queries. users per second: 19819
recommendations finished on 34000/34354 queries. users per second: 19781
Precision and recall summary statistics by cutoff
+--------+----------------------+----------------------+
| cutoff |     mean_recall      |    mean_precision    |
+--------+----------------------+----------------------+
|   1    | 0.014688842060795695 | 0.050445362985387564 |
|   2    | 0.03291712354999962  | 0.06248180706759065  |
|   3    | 0.05399142375515901  | 0.07436300479323134  |
|   4    | 0.06963924374087777  | 0.07609012050998376  |
|   5    | 0.08384375244888208  | 0.07552541188798938  |
|   6    |  0.0959100718474003  | 0.07364499039413211  |
|   7    | 0.10643883706242792  | 0.07136619566031029  |
|   8    | 0.11604275927620446  | 0.06902398556208861  |
|   9    |  0.1246532903037569  | 0.06660392126422279  |
|   10   | 0.13280279585995652  | 0.06440006986086082  |
+--------+----------------------+----------------------+
[10 rows x 3 columns]


Overall RMSE: 7.041096333660663

Per User RMSE (best)
+-------------------------------+----------------------+-------+
|            user_id            |         rmse         | count |
+-------------------------------+----------------------+-------+
| 714c577dfeaa6ffaf778286302... | 0.022276699542999268 |   1   |
+-------------------------------+----------------------+-------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+--------------------+-------+
|            user_id            |        rmse        | count |
+-------------------------------+--------------------+-------+
| 38767872c514c1b43bab5c7b21... | 343.85311015265717 |   2   |
+-------------------------------+--------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+--------------------+-------+
|      music_id      |        rmse        | count |
+--------------------+--------------------+-------+
| SOBHIJM12AB018194F | 0.7032955339939242 |   6   |
+--------------------+--------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+--------------------+-------+
|      music_id      |        rmse        | count |
+--------------------+--------------------+-------+
| SOLGIWB12A58A77A05 | 109.83254923449184 |   35  |
+--------------------+--------------------+-------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M2
recommendations finished on 1000/34354 queries. users per second: 14762.3
recommendations finished on 2000/34354 queries. users per second: 16166.7
recommendations finished on 3000/34354 queries. users per second: 16291.6
recommendations finished on 4000/34354 queries. users per second: 16418.9
recommendations finished on 5000/34354 queries. users per second: 16427.5
recommendations finished on 6000/34354 queries. users per second: 16329.2
recommendations finished on 7000/34354 queries. users per second: 16405.7
recommendations finished on 8000/34354 queries. users per second: 16508
recommendations finished on 9000/34354 queries. users per second: 16428.6
recommendations finished on 10000/34354 queries. users per second: 16403.8
recommendations finished on 11000/34354 queries. users per second: 16274.9
recommendations finished on 12000/34354 queries. users per second: 16310.9
recommendations finished on 13000/34354 queries. users per second: 16293.5
recommendations finished on 14000/34354 queries. users per second: 16244.3
recommendations finished on 15000/34354 queries. users per second: 16212
recommendations finished on 16000/34354 queries. users per second: 16201.7
recommendations finished on 17000/34354 queries. users per second: 16228.6
recommendations finished on 18000/34354 queries. users per second: 16249.9
recommendations finished on 19000/34354 queries. users per second: 16250.9
recommendations finished on 20000/34354 queries. users per second: 16246
recommendations finished on 21000/34354 queries. users per second: 16212.5
recommendations finished on 22000/34354 queries. users per second: 16218.8
recommendations finished on 23000/34354 queries. users per second: 16218.1
recommendations finished on 24000/34354 queries. users per second: 16230.1
recommendations finished on 25000/34354 queries. users per second: 16172.3
recommendations finished on 26000/34354 queries. users per second: 16175.4
recommendations finished on 27000/34354 queries. users per second: 16188.5
recommendations finished on 28000/34354 queries. users per second: 16199.1
recommendations finished on 29000/34354 queries. users per second: 16199.1
recommendations finished on 30000/34354 queries. users per second: 16217
recommendations finished on 31000/34354 queries. users per second: 16228.5
recommendations finished on 32000/34354 queries. users per second: 16176.8
recommendations finished on 33000/34354 queries. users per second: 16184.1
recommendations finished on 34000/34354 queries. users per second: 16170.4
Precision and recall summary statistics by cutoff
+--------+------------------------+------------------------+
| cutoff |      mean_recall       |     mean_precision     |
+--------+------------------------+------------------------+
|   1    |  7.0023278723497e-05   | 0.0004075216859754317  |
|   2    | 0.00015701647350617692 | 0.0004220760319031288  |
|   3    | 0.00025150526145133087 | 0.00041722458326056295 |
|   4    | 0.00034565751961335053 | 0.00043663037783081874 |
|   5    | 0.00046127750964837765 | 0.0004424521162018989  |
|   6    | 0.0006061722786786824  | 0.0004899963128990292  |
|   7    |  0.00070561257944388   | 0.0005197980688462193  |
|   8    | 0.0008535975421092137  | 0.0005457879722885229  |
|   9    | 0.0009773205690967743  | 0.0005433622479672466  |
|   10   | 0.0011201566127395915  | 0.0005647086219945271  |
+--------+------------------------+------------------------+
[10 rows x 3 columns]


Overall RMSE: 7.780049286375036

Per User RMSE (best)
+-------------------------------+------------------------+-------+
|            user_id            |          rmse          | count |
+-------------------------------+------------------------+-------+
| 80495441caacdc7e069b441047... | 5.1963096963536515e-05 |   1   |
+-------------------------------+------------------------+-------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+--------------------+-------+
|            user_id            |        rmse        | count |
+-------------------------------+--------------------+-------+
| 38767872c514c1b43bab5c7b21... | 338.99132097019395 |   2   |
+-------------------------------+--------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+---------------------+-------+
|      music_id      |         rmse        | count |
+--------------------+---------------------+-------+
| SOCOZST12A67020452 | 0.06951240175355755 |   1   |
+--------------------+---------------------+-------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+--------------------+-------+
|      music_id      |        rmse        | count |
+--------------------+--------------------+-------+
| SOVQSQZ12A8C13F960 | 113.92735214599614 |   17  |
+--------------------+--------------------+-------+
[1 rows x 3 columns]

In [12]:
K = 10
users = gl.SArray(sf['user_id'].unique().head(100))
In [13]:
recs = item_sim_model.recommend(users=users, k=K)
recs.head()
Out[13]:
user_id music_id score rank
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOXUQNR12AF72A69D6 3.022422651449839 1
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOUFAZA12AC3DFAB20 1.3368427753448486 2
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOSFSTC12A8C141219 1.091982126235962 3
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOVIWFP12A58A7D1BD 1.045163869857788 4
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOBMTQD12AB01833D0 1.0294516881306965 5
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOCMNRG12AB0189D3F 0.9756437937418619 6
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOXOHUM12A67ADC826 0.9506873289744059 7
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOWBFVW12A6D4F612B 0.9092370669047037 8
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOXFYTY127E9433E7D 0.8977278073628744 9
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
SOYBLYP12A58A79D32 0.8970928192138672 10
[10 rows x 4 columns]
In [ ]: