!pip install -q tensorflow_text
|████████████████████████████████| 4.3MB 8.2MB/s
import numpy as np
import pandas as pd
import nltk
import json
import re
import csv
import pickle
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import cosine_similarity
import tensorflow_hub as hub
import tensorflow_text
data = pd.read_json('https://raw.githubusercontent.com/sparsh-ai/reco-data/master/books.json', lines=True)
data.head()
_id | title | isbn | pageCount | publishedDate | thumbnailUrl | shortDescription | longDescription | status | authors | categories | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Unlocking Android | 1933988673 | 416 | {'$date': '2009-04-01T00:00:00.000-0700'} | https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.... | Unlocking Android: A Developer's Guide provide... | Android is an open source mobile phone platfor... | PUBLISH | [W. Frank Ableson, Charlie Collins, Robi Sen] | [Open Source, Mobile] |
1 | 2 | Android in Action, Second Edition | 1935182722 | 592 | {'$date': '2011-01-14T00:00:00.000-0800'} | https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.... | Android in Action, Second Edition is a compreh... | When it comes to mobile apps, Android can do a... | PUBLISH | [W. Frank Ableson, Robi Sen] | [Java] |
2 | 3 | Specification by Example | 1617290084 | 0 | {'$date': '2011-06-03T00:00:00.000-0700'} | https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.... | NaN | NaN | PUBLISH | [Gojko Adzic] | [Software Engineering] |
3 | 4 | Flex 3 in Action | 1933988746 | 576 | {'$date': '2009-02-02T00:00:00.000-0800'} | https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.... | NaN | New web applications require engaging user-fri... | PUBLISH | [Tariq Ahmed with Jon Hirschi, Faisal Abid] | [Internet] |
4 | 5 | Flex 4 in Action | 1935182420 | 600 | {'$date': '2010-11-15T00:00:00.000-0800'} | https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.... | NaN | Using Flex, you can create high-quality, effec... | PUBLISH | [Tariq Ahmed, Dan Orlando, John C. Bland II, J... | [Internet] |
df = data[['title', 'authors', 'isbn','shortDescription','thumbnailUrl']].copy()
df['authors'] = df['authors'].str[0]
df.dropna(subset = ["shortDescription"], inplace=True)
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")
We convert our dataframe to a dictionnary where each row becomes a dictionary where key is column name and value is the data in the cell
full_data = df.to_dict('records')
# add vectors to the data
for element in full_data:
element['vector'] = embed(element['shortDescription'])[0]
vectors = [item['vector'] for item in full_data]
X = np.array(vectors)
# calculate similarity based on Euclidean distance
sim = euclidean_distances(X)
indices = np.vstack([np.argsort(-arr) for arr in sim])
# calculate similarity based on cosine distance
cos_sim = cosine_similarity(X)
cos_indices = np.vstack([np.argsort(-arr) for arr in cos_sim])
# find top-k most similar books for each case
topk = 20
for i, book in enumerate(full_data):
book['euclidean'] = indices[i][1:topk+1]
book['cosine'] = cos_indices[i][1:topk+1]
# remove vectors from dict
for book in full_data:
book.pop('vector')
full_data[0]
{'authors': 'W. Frank Ableson', 'cosine': array([ 1, 31, 146, 147, 116, 143, 157, 45, 134, 137, 50, 83, 9, 48, 57, 101, 156, 4, 99, 53]), 'euclidean': array([154, 65, 46, 84, 94, 28, 117, 38, 6, 77, 79, 8, 80, 15, 123, 14, 16, 110, 138, 85]), 'isbn': '1933988673', 'shortDescription': "Unlocking Android: A Developer's Guide provides concise, hands-on instruction for the Android operating system and development tools. This book teaches important architectural concepts in a straightforward writing style and builds on this with practical and useful examples throughout.", 'thumbnailUrl': 'https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/ableson.jpg', 'title': 'Unlocking Android'}
# save the data
with open('model_01.pkl', 'wb') as f:
pickle.dump(full_data, f)
%%writefile Procfile
web: gunicorn app:app
Writing Procfile
!mkdir templates static
mkdir: cannot create directory ‘templates’: File exists
!wget -O ./static/logo.png https://images-platform.99static.com//JY78phRJ6tT1yo1QGGfhZOVlrAw=/68x2062:932x2926/fit-in/500x500/99designs-contests-attachments/87/87917/attachment_87917977
--2021-07-06 11:47:10-- https://images-platform.99static.com//JY78phRJ6tT1yo1QGGfhZOVlrAw=/68x2062:932x2926/fit-in/500x500/99designs-contests-attachments/87/87917/attachment_87917977 Resolving images-platform.99static.com (images-platform.99static.com)... 104.22.61.120, 104.22.60.120, 172.67.12.78, ... Connecting to images-platform.99static.com (images-platform.99static.com)|104.22.61.120|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 29258 (29K) [image/png] Saving to: ‘./static/logo.png’ ./static/logo.png 100%[===================>] 28.57K --.-KB/s in 0.001s 2021-07-06 11:47:10 (47.8 MB/s) - ‘./static/logo.png’ saved [29258/29258]
%%writefile ./templates/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Book recommendation</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.16.0/umd/popper.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
</head>
<body class="bg-dark">
<style>
body {
background-color:coral;
}
#content {
position: relative;
}
#content img {
position: absolute;
top: 0px;
right: 25px;
}
#content p {
position: absolute;
top: 150px;
right: 0px;}
</style>
<div class="p-3 text-white">
<div id="content">
<img src="{{ url_for('static', filename='logo.png') }}" width=150 class="ribbon"/>
</div>
<div class="row">
<div class="col-6">
<form method="post">
<div class="form-group">
<label for="sel2" style="font-size:25pt;color:yellow;font-style:bold;">CHOOSE A METRIC:</label>
<select class="form-control" id="sel2" name="selected_metric">
<option>cosine</option>
<option>euclidean</option>
</select>
</div>
<div class="form-group">
<label for="sel1" style="font-size:25pt;color:yellow;font-style:bold;">CHOOSE A BOOK:</label>
<select class="form-control" id="sel1" name="selected_title">
{% for title in list_books %}
<option>{{ title }}</option>
{% endfor %}
</select>
</div>
<input type="submit" class="btn btn-light" value="Recommender"/>
</form>
</div>
<div class="col-6">
{% if book_selected is defined %}
<h2 style="font-size:11pt;color:red;font-style:bold;">SELECTED BOOK</h2>
<img src="{{ book_selected.thumbnailUrl }}">
{% endif %}
</div>
</div>
{% if similar_books is defined %}
<br/><br/>
<h2>Here are your other reading suggestions:</h2>
<div class="row">
{% for book in similar_books %}
<div class="col-2 p-3 d-flex justify-content-center">
<img src="{{ book.thumbnailUrl }}">
</div>
{% endfor %}
</div>
{% endif %}
</div>
</body>
</html>
Overwriting ./templates/index.html
!pip install flask_ngrok
from flask_ngrok import run_with_ngrok
Collecting flask_ngrok Downloading https://files.pythonhosted.org/packages/af/6c/f54cb686ad1129e27d125d182f90f52b32f284e6c8df58c1bae54fa1adbc/flask_ngrok-0.0.25-py3-none-any.whl Requirement already satisfied: Flask>=0.8 in /usr/local/lib/python3.7/dist-packages (from flask_ngrok) (1.1.4) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from flask_ngrok) (2.23.0) Requirement already satisfied: click<8.0,>=5.1 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.8->flask_ngrok) (7.1.2) Requirement already satisfied: itsdangerous<2.0,>=0.24 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.8->flask_ngrok) (1.1.0) Requirement already satisfied: Werkzeug<2.0,>=0.15 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.8->flask_ngrok) (1.0.1) Requirement already satisfied: Jinja2<3.0,>=2.10.1 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.8->flask_ngrok) (2.11.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->flask_ngrok) (2021.5.30) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->flask_ngrok) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->flask_ngrok) (1.24.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->flask_ngrok) (3.0.4) Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2<3.0,>=2.10.1->Flask>=0.8->flask_ngrok) (2.0.1) Installing collected packages: flask-ngrok Successfully installed flask-ngrok-0.0.25
%%writefile app.py
import pickle
from flask import Flask, request, render_template, jsonify
from flask_ngrok import run_with_ngrok
import numpy as np
app = Flask(__name__)
run_with_ngrok(app)
# load data and extract all the vectors
with open('model_01.pkl', 'rb') as f:
book_data = pickle.load(f)
list_books = sorted([book['title'] for book in book_data])
isbn_list = [item['isbn'] for item in book_data]
@app.route("/", methods=['GET', 'POST'])
def template_test():
if request.method == 'POST':
selected_title = request.form.get('selected_title')
selected_metric = request.form.get('selected_metric')
selected_book = next(item for item in book_data if item['title'] == selected_title)
similar_books = [book_data[i] for i in selected_book[selected_metric]]
return render_template('index.html',
list_books=list_books,
book_selected=selected_book,
similar_books=similar_books[:6])
else:
return render_template('index.html', list_books=list_books)
@app.route("/recommendations", methods=['GET'])
def get_recommendations():
isbn = request.args.get('isbn', default=None, type=str)
num_reco = request.args.get("number", default=5, type=int)
distance = request.args.get("distance", default="cosine", type=str)
field = request.args.get("field", default="isbn", type=str)
if not isbn:
return jsonify("Missing ISBN for the book"), 400
elif distance not in ["cosine", "euclidean"]:
return jsonify("Distance can only be cosine or euclidean"), 400
elif num_reco not in range(1, 21):
return jsonify("Can only request between 1 and 20 books"), 400
elif isbn not in isbn_list:
return jsonify("ISBN not in supported books"), 400
elif field not in book_data[0].keys():
return jsonify("Field not available in the data"), 400
else:
try:
selected_book = next(item for item in book_data if item['isbn'] == isbn)
similar_books = [book_data[i][field] for i in selected_book[distance]]
return jsonify(similar_books[:num_reco]), 200
except Exception as e:
return jsonify(str(e)), 500
if __name__ == '__main__':
app.run()
Writing app.py
!python app.py
* Serving Flask app "app" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) * Running on http://b40dcdb32801.ngrok.io * Traffic stats available on http://127.0.0.1:4040 127.0.0.1 - - [06/Jul/2021 11:48:20] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:48:21] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:48:22] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [06/Jul/2021 11:51:45] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:51:46] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:51:48] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [06/Jul/2021 11:52:20] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:52:21] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:52:23] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [06/Jul/2021 11:53:00] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:53:01] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:53:02] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [06/Jul/2021 11:53:40] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:53:41] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:53:44] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [06/Jul/2021 11:54:27] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:54:28] "GET /static/logo.png HTTP/1.1" 200 - 127.0.0.1 - - [06/Jul/2021 11:54:30] "GET /favicon.ico HTTP/1.1" 404 - ^C
Tip: Ignore the selection dropdowns in below images. They gets reset after clicking on Recommender button.
Cosine
Euclidean
Cosine performed much better.
Cosine
Euclidean
There are not much Scala related books. So Cosine recommender suggest Groovy, Java books. But Euclidean recommender suggesting Coffeehouse, Client server books. In this case also, cosine clearly outperformed euclidean.