Learn to talk about wine using Javascript

This is the notebook used to write the post published on Medium.

Loading the Library

In [2]:
Collection = require('dstools').Collection;
Out[2]:
{ [Function] registerFunction: [Function] }

Loading the Data

In [3]:
data = Collection().loadCSV('/home/elshor/data/winemag-data-130k-v2.csv')
Finished loading 129971 rows from /home/elshor/data/winemag-data-130k-v2.csv
In [5]:
data.head().show()
Out[5]:
Table View
idcountrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.Vulkà Bianco87Sicily & SardiniaEtnaKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled out with juicy red berry fruits and freshened with acidity. It's already drinkable, although it will certainly be better from 2016.Avidagos8715DouroRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and rind dominate. Some green pineapple pokes through, with crisp acidity underscoring the flavors. The wine was all stainless-steel fermented.8714OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom start off the aromas. The palate is a bit more opulent, with notes of honey-drizzled guava and mango giving way to a slightly astringent, semidry finish.Reserve Late Harvest8713MichiganLake Michigan ShoreAlexander PeartreeSt. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore)RieslingSt. Julian
4USMuch like the regular bottling from 2012, this comes across as rather rough and tannic, with rustic, earthy, herbal characteristics. Nonetheless, if you think of it as a pleasantly unfussy country wine, it's a good companion to a hearty winter stew.Vintner's Reserve Wild Child Block8765OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley)Pinot NoirSweet Cheeks
In [6]:
data.fields().data().join();//showing the fields
Out[6]:
'id,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery'

Term Counts

In [7]:
data
.terms({field:'description'}) //extract terms in field description
.dropStopwords('term')//remove stopwords
.sortDesc('count') //sort by count or terms
.head(5)//choose top 5 terms
.show();//show it
Out[7]:
Table View
termcount
wine80166
flavors60322
fruit49671
palate38516
aromas35450

Word Cloud Visualization

In [8]:
data.terms({field:'description'}).dropStopwords('term')
.sortDesc('count').head(50)
.wordCloud('term','count')//arguments are label and measure
.show();
Out[8]:

Cabernet Sauvignon Term Count

In [9]:
data
.terms({field:'description',groupBy:'variety'}) //group by variety
.dropStopwords('term')//remove stopwords
.filterEqual('variety','Cabernet Sauvignon')//only show cabernet sauvignon terms
.sortDesc('count') //sort by count or terms
.head(30)//top 30 terms
.column('term')//get 'term' column
.data().join(', ');//show the terms as a list
Out[9]:
'flavors, wine, black, tannins, fruit, Cabernet, cherry, finish, oak, aromas, blackberry, palate, cassis, chocolate, currant, plum, ripe, red, notes, berry, dark, rich, dry, soft, sweet, full, years, bodied, spice, tannic'

Using TFIDF

In [10]:
data
.terms({field:'description',groupBy:'variety',calc:'tfidf,idf'}) //calculate tfidf and idf
.dropStopwords('term')//remove stopwords
.filterEqual('variety','Cabernet Sauvignon')//only show cabernet sauvignon terms
.sortDesc('tfidf') //sort by count or terms
.head(30)//top 30 terms
.show()
Out[10]:
Table View
termvarietytfidfidf
CabernetCabernet Sauvignon0.023462440593387383.028431172743317
blackCabernet Sauvignon0.016733178715878161.8877073987250803
tanninsCabernet Sauvignon0.0162536778038510671.8407188892891613
flavorsCabernet Sauvignon0.0161688951110702771.2224365876933898
cherryCabernet Sauvignon0.0142472654925299791.9226759965628268
wineCabernet Sauvignon0.0139053707062576741.1475717087292159
blackberryCabernet Sauvignon0.0128366678329194332.233154497106992
oakCabernet Sauvignon0.0125397640943125862.000349034881045
fruitCabernet Sauvignon0.0116257568513991411.3464225674743808
cassisCabernet Sauvignon0.0108284103697791262.670681537674819
CabCabernet Sauvignon0.0096159978702747353.754368176126253
finishCabernet Sauvignon0.0090223721242107671.3424305462048436
chocolateCabernet Sauvignon0.008488562002346222.262713299348536
currantCabernet Sauvignon0.0084822079817716992.4135361890831195
aromasCabernet Sauvignon0.0076194752661326291.3188074004414077
palateCabernet Sauvignon0.0071161909590593291.3424305462048436
berryCabernet Sauvignon0.006615590027423042.0276411771690523
blackberriesCabernet Sauvignon0.0066114535901545063.3563380465056065
cedarCabernet Sauvignon0.0066005761339985352.807440474790208
plumCabernet Sauvignon0.0065982615253865151.8946039777841406
darkCabernet Sauvignon0.00655550832918826152.0971988608709626
tannicCabernet Sauvignon0.00640436952603522152.4490428775400295
yearsCabernet Sauvignon0.0061369582582854582.2380206867581647
redCabernet Sauvignon0.0058418780480049911.7409477355442111
dryCabernet Sauvignon0.005598282735489241.8472978603872037
richCabernet Sauvignon0.0055160971592722481.7927096701028007
ripeCabernet Sauvignon0.0053468894501063051.5571435987900335
notesCabernet Sauvignon0.00530034965967950651.5923231059112068
herbalCabernet Sauvignon0.0050924163900985882.242910672052356
sweetCabernet Sauvignon0.0050427342337441551.7772054835668354

Filter using IDF

In [11]:
data
.terms({field:'description',groupBy:'variety',calc:'tfidf,idf'})
.dropStopwords('term')
.filterEqual('variety','Cabernet Sauvignon')
.filter((term)=>term.idf>2)
.sortDesc('tfidf')
.head(50)
.wordCloud('term','tfidf',{title:'Word Cloud for Cabernet Sauvignon'})
.show()
Out[11]:
In [12]:
data
.terms({field:'description',groupBy:'variety',calc:'tfidf,idf'})
.dropStopwords('term')
.filterEqual('variety','Chardonnay')
.filter((term)=>term.idf>2)
.sortDesc('tfidf')
.head(50)
.wordCloud('term','tfidf',{title:'Word Cloud for Chardonnay'})
.show()
Out[12]:

word2vec

In [4]:
data
.column('description')//get a vector of wine's description field
.toLowerCase()//turn the descriptions into lower case
.merge()//merge all descriptions into one string
.save('wine-descriptions.txt')//save them into a file
Out[4]:
[Wrapper object]
In [6]:
word2vec = require( 'word2vec' );
word2vec.word2vec('wine-descriptions.txt','wine-model.txt');
cStarting training using file ../../../wine-descriptions.txt
200K300K400K500K600K700K800K900K1000K1100K1200K1300K1400K1500K1600K1700K1800K1900K2000K2100K2200K2300K2400K2500K2600K2700K2800K2900K3000K3100K3200K3300K3400K3500K3600K3700K3800K3900K4000K4100K4200K4300K4400K4500K4600K4700K4800K4900K5000K5100K5200K5300KVocab size: 20422
Words in train file: 5298121
Alpha: 0.045573  Progress: 8.89%  Words/thread/sec: 295.22k  [36m
In [7]:
//load the model from file
word2vec.loadModel('wine-model.txt', function( err, model ) {
['blackberry','chocolate','tropical','mineral','green']//terms
.forEach((base)=>console.log(base + ': ' + 
//most similar function returns terms most similar to base
model.mostSimilar(base,10)
.map((term)=>term.word).join()));//show terms in list
});
blackberry: berry,raspberry,blueberry,blackberry,,boysenberry,black,dark-berry,strawberry,black-cherry,red-berry
chocolate: chocolate,,mocha,cocoa,licorice,coffee,carob,molasses,tobacco,cola,cedar
tropical: passion,kiwi,stone,pineapple,lychee,yellow,melon,mango,peachy,papaya
mineral: nervy,steely,minerality,flinty,minerally,mineral,,mineral-driven,lemon-zest,tangy,saline
green: bruised,sliced,cider,green,,fresh-cut,grassy,cucumber,gala,underripe,yellow

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by OVHcloud

nbviewer GitHub repository.

nbviewer version: 90c61cc

nbconvert version: 5.6.1

Rendered (Mon, 18 Oct 2021 10:29:35 UTC)