Notebook

Training Embeddings Using Gensim and FastText¶

Word embeddings are an approach to representing text in NLP. In this notebook we will demonstrate how to train embeddings both CBOW and SkipGram methods using Genism and Fasttext.

toc: true
badges: true
comments: true
categories: [Concept, Embedding, Gensim, FastText]
author: "Quantum Stat"
image:

In [ ]:

from gensim.models import Word2Vec
import warnings
warnings.filterwarnings('ignore')

In [ ]:

# define training data
#Genism word2vec requires that a format of ‘list of lists’ be provided for training where every document contained in a list.
#Every list contains lists of tokens of that document.
corpus = [['dog','bites','man'], ["man", "bites" ,"dog"],["dog","eats","meat"],["man", "eats","food"]]

#Training the model
model_cbow = Word2Vec(corpus, min_count=1,sg=0) #using CBOW Architecture for trainnig
model_skipgram = Word2Vec(corpus, min_count=1,sg=1)#using skipGram Architecture for training 

Continuous Bag of Words (CBOW)¶

In CBOW, the primary task is to build a language model that correctly predicts the center word given the context words in which the center word appears.

In [ ]:

#Summarize the loaded model
print(model_cbow)

#Summarize vocabulary
words = list(model_cbow.wv.vocab)
print(words)

#Acess vector for one word
print(model_cbow['dog'])

Word2Vec(vocab=6, size=100, alpha=0.025)
['dog', 'bites', 'man', 'eats', 'meat', 'food']
[-3.1667745e-03  2.5268614e-03 -4.9504861e-03  2.3797194e-03
 -3.3511904e-03  1.7659335e-03 -9.6838089e-04  3.6862001e-03
  3.3760078e-03 -1.1944126e-03 -4.7475514e-03 -4.6677454e-03
  4.7231275e-03  2.1875298e-03  4.9989321e-03 -4.7024325e-04
  4.6936749e-03  4.5417100e-03 -4.8383311e-03  4.5522186e-03
  9.4010920e-04 -2.8778350e-03 -2.3938445e-03  7.6240452e-04
  2.8537741e-05 -1.0585956e-03  1.5203804e-03  1.1994856e-04
  4.3881699e-03  3.5755127e-04  1.9964906e-03 -3.3893189e-03
  2.5362791e-03 -3.8559963e-03 -4.6814438e-03 -1.0485576e-03
  1.9576577e-03 -5.4296525e-04  2.5505766e-03  1.4563937e-03
  1.1214090e-03  3.1200200e-03  3.5230191e-03  4.4931062e-03
 -5.5389071e-04  1.6268899e-03 -4.6736463e-03 -1.9612674e-04
  1.5486709e-03 -3.5581242e-03  1.5163666e-03  2.2859944e-03
 -3.5728619e-03 -3.5505979e-03  7.8282715e-04 -4.8093311e-03
 -3.1324120e-03 -3.6213300e-03 -1.4478542e-03  3.4006054e-03
  2.2276146e-03 -4.1698264e-03 -3.6997625e-03 -4.1264743e-03
 -4.9103238e-03 -2.2635974e-03 -3.9036905e-03  3.8846405e-03
 -7.9726276e-05 -2.0692295e-03 -3.0645117e-04 -3.0288144e-03
 -3.4682599e-03 -3.1768843e-03 -1.1148058e-03 -2.8012963e-03
 -6.5973290e-04 -2.3705217e-03  4.3961490e-03  3.2166531e-03
  3.6933657e-04 -6.2054797e-04  2.0661615e-04  3.7390803e-04
 -3.5061471e-03  3.6587315e-03  2.1328868e-03 -2.5964181e-03
  4.3381471e-03  4.0168604e-03  1.8054987e-03 -1.2192487e-03
  1.5615283e-03 -1.8635839e-03  2.9529419e-03 -3.3825964e-03
 -3.2592549e-03 -4.7523994e-04 -5.3210353e-04 -9.8173530e-04]

In [ ]:

#Compute similarity 
print("Similarity between eats and bites:",model_cbow.similarity('eats', 'bites'))
print("Similarity between eats and man:",model_cbow.similarity('eats', 'man'))

Similarity between eats and bites: -0.09852024
Similarity between eats and man: -0.17088428

From the above similarity scores we can conclude that eats is more similar to bites than man.

In [ ]:

#Most similarity
model_cbow.most_similar('meat')

Out[ ]:

[('bites', 0.1353721022605896),
 ('man', 0.1094527617096901),
 ('food', -0.02215239405632019),
 ('dog', -0.1444159597158432),
 ('eats', -0.16309654712677002)]

In [ ]:

# save model
model_cbow.save('model_cbow.bin')

# load model
new_model_cbow = Word2Vec.load('model_cbow.bin')
print(new_model_cbow)

Word2Vec(vocab=6, size=100, alpha=0.025)

SkipGram¶

In skipgram, the task is to predict the context words from the center word.

In [ ]:

#Summarize the loaded model
print(model_skipgram)

#Summarize vocabulary
words = list(model_skipgram.wv.vocab)
print(words)

#Acess vector for one word
print(model_skipgram['dog'])

Word2Vec(vocab=6, size=100, alpha=0.025)
['dog', 'bites', 'man', 'eats', 'meat', 'food']
[-3.1667745e-03  2.5268614e-03 -4.9504861e-03  2.3797194e-03
 -3.3511904e-03  1.7659335e-03 -9.6838089e-04  3.6862001e-03
  3.3760078e-03 -1.1944126e-03 -4.7475514e-03 -4.6677454e-03
  4.7231275e-03  2.1875298e-03  4.9989321e-03 -4.7024325e-04
  4.6936749e-03  4.5417100e-03 -4.8383311e-03  4.5522186e-03
  9.4010920e-04 -2.8778350e-03 -2.3938445e-03  7.6240452e-04
  2.8537741e-05 -1.0585956e-03  1.5203804e-03  1.1994856e-04
  4.3881699e-03  3.5755127e-04  1.9964906e-03 -3.3893189e-03
  2.5362791e-03 -3.8559963e-03 -4.6814438e-03 -1.0485576e-03
  1.9576577e-03 -5.4296525e-04  2.5505766e-03  1.4563937e-03
  1.1214090e-03  3.1200200e-03  3.5230191e-03  4.4931062e-03
 -5.5389071e-04  1.6268899e-03 -4.6736463e-03 -1.9612674e-04
  1.5486709e-03 -3.5581242e-03  1.5163666e-03  2.2859944e-03
 -3.5728619e-03 -3.5505979e-03  7.8282715e-04 -4.8093311e-03
 -3.1324120e-03 -3.6213300e-03 -1.4478542e-03  3.4006054e-03
  2.2276146e-03 -4.1698264e-03 -3.6997625e-03 -4.1264743e-03
 -4.9103238e-03 -2.2635974e-03 -3.9036905e-03  3.8846405e-03
 -7.9726276e-05 -2.0692295e-03 -3.0645117e-04 -3.0288144e-03
 -3.4682599e-03 -3.1768843e-03 -1.1148058e-03 -2.8012963e-03
 -6.5973290e-04 -2.3705217e-03  4.3961490e-03  3.2166531e-03
  3.6933657e-04 -6.2054797e-04  2.0661615e-04  3.7390803e-04
 -3.5061471e-03  3.6587315e-03  2.1328868e-03 -2.5964181e-03
  4.3381471e-03  4.0168604e-03  1.8054987e-03 -1.2192487e-03
  1.5615283e-03 -1.8635839e-03  2.9529419e-03 -3.3825964e-03
 -3.2592549e-03 -4.7523994e-04 -5.3210353e-04 -9.8173530e-04]

In [ ]:

#Compute similarity 
print("Similarity between eats and bites:",model_skipgram.similarity('eats', 'bites'))
print("Similarity between eats and man:",model_skipgram.similarity('eats', 'man'))

Similarity between eats and bites: -0.09852936
Similarity between eats and man: -0.17089055

From the above similarity scores we can conclude that eats is more similar to bites than man.

In [ ]:

#Most similarity
model_skipgram.most_similar('meat')

Out[ ]:

[('bites', 0.1353721022605896),
 ('man', 0.10945276916027069),
 ('food', -0.022152386605739594),
 ('dog', -0.1444159746170044),
 ('eats', -0.16317100822925568)]

In [ ]:

# save model
model_skipgram.save('model_skipgram.bin')

# load model
new_model_skipgram = Word2Vec.load('model_skipgram.bin')
print(new_model_skipgram)

Word2Vec(vocab=6, size=100, alpha=0.025)

Training Your Embedding on Wiki Corpus¶

The corpus download page : https://dumps.wikimedia.org/enwiki/20200120/¶

The entire wiki corpus as of 28/04/2020 is just over 16GB in size. We will take a part of this corpus due to computation constraints and train our word2vec and fasttext embeddings.

The file size is 294MB so it can take a while to download.

Source for code which downloads files from Google Drive: https://stackoverflow.com/questions/25010369/wget-curl-large-file-from-google-drive/39225039#39225039

In [ ]:

import os
import requests

os.makedirs('data/en', exist_ok= True)
file_name = "data/en/enwiki-latest-pages-articles-multistream14.xml-p13159683p14324602.bz2"
file_id = "11804g0GcWnBIVDahjo5fQyc05nQLXGwF"

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if not os.path.exists(file_name):
    download_file_from_google_drive(file_id, file_name)
else:
    print("file already exists, skipping download")

print(f"File at: {file_name}")

file already exists, skipping download
File at: data/en/enwiki-latest-pages-articles-multistream14.xml-p13159683p14324602.bz2

In [ ]:

from gensim.corpora.wikicorpus import WikiCorpus
from gensim.models.word2vec import Word2Vec
from gensim.models.fasttext import FastText
import time

In [ ]:

#Preparing the Training data
wiki = WikiCorpus(file_name, lemmatize=False, dictionary={})
sentences = list(wiki.get_texts())

#if you get a memory error executing the lines above
#comment the lines out and uncomment the lines below. 
#loading will be slower, but stable.
# wiki = WikiCorpus(file_name, processes=4, lemmatize=False, dictionary={})
# sentences = list(wiki.get_texts())

#if you still get a memory error, try settings processes to 1 or 2 and then run it again.

Hyperparameters¶

sg - Selecting the training algorithm: 1 for skip-gram else its 0 for CBOW. Default is CBOW.
min_count- Ignores all words with total frequency lower than this.

There are many more hyperparamaeters whose list can be found in the official documentation here.

In [ ]:

#CBOW
start = time.time()
word2vec_cbow = Word2Vec(sentences,min_count=10, sg=0)
end = time.time()

print("CBOW Model Training Complete.\nTime taken for training is:{:.2f} hrs ".format((end-start)/3600.0))

CBOW Model Training Complete.
Time taken for training is:0.04 hrs

In [ ]:

#Summarize the loaded model
print(word2vec_cbow)
print("-"*30)

#Summarize vocabulary
words = list(word2vec_cbow.wv.vocab)
print(f"Length of vocabulary: {len(words)}")
print("Printing the first 30 words.")
print(words[:30])
print("-"*30)

#Acess vector for one word
print(f"Length of vector: {len(word2vec_cbow['film'])}")
print(word2vec_cbow['film'])
print("-"*30)

#Compute similarity 
print("Similarity between film and drama:",word2vec_cbow.similarity('film', 'drama'))
print("Similarity between film and tiger:",word2vec_cbow.similarity('film', 'tiger'))
print("-"*30)

Word2Vec(vocab=111150, size=100, alpha=0.025)
------------------------------
Length of vocabulary: 111150
Printing the first 30 words.
['the', 'roses', 'registered', 'as', 'is', 'brisbane', 'racing', 'club', 'group', 'thoroughbred', 'horse', 'race', 'for', 'three', 'year', 'old', 'filles', 'run', 'under', 'set', 'weights', 'conditions', 'over', 'distance', 'of', 'metres', 'at', 'racecourse', 'australia', 'during']
------------------------------
Length of vector: 100
[-0.25941572 -1.6287326   2.5331333  -1.5818936   0.9024474   0.8614945
  2.4875445  -0.95802265 -1.3792082  -1.1744157  -4.300686    1.0071316
  0.10418405  4.855032    0.6251962  -0.06472338  0.19993098 -0.7291219
  2.342258   -1.7298651   0.7895099  -2.2819378   0.7158192  -0.62419826
  0.6720258   3.6712303   1.3836899   0.17808275 -3.7205396   0.2529162
  1.0290879  -0.9228959   0.9451632   1.7415334   1.9618814   1.4535053
  2.670452    0.9272077   0.25056183 -0.4078236   0.5795217   0.6316829
  0.50204426 -0.19865237 -2.697352    0.75351495  1.0796617   2.247825
 -2.956658    2.6606686  -0.42392135 -0.44319883 -2.9274392  -1.0198026
  1.404833   -0.10840467  0.50829273  1.0767945  -0.65002084 -3.4231277
  4.719826   -1.5996053   0.82882935  1.635043   -0.45730942 -1.3166244
 -1.3349417  -2.3565981   1.7141095  -2.6643796  -1.2148786   0.2972199
 -2.2865987  -1.6022073   2.0965865  -0.87479544 -1.4143106  -0.9149557
  2.2900226   1.1464663  -2.6113467  -1.5517493   1.3018385   4.1072307
  1.1441547   1.0222696   0.4847384   2.4148073  -2.881392   -0.67044157
 -2.482836   -0.417894    3.1442287  -1.6087203   1.865813   -3.717568
  0.5994761   1.8819104   3.355772   -1.9087372 ]
------------------------------
Similarity between film and drama: 0.4986632
Similarity between film and tiger: 0.15477756
------------------------------

In [ ]:

# save model
from gensim.models import Word2Vec, KeyedVectors   
word2vec_cbow.wv.save_word2vec_format('word2vec_cbow.bin', binary=True)

# load model
# new_modelword2vec_cbow = Word2Vec.load('word2vec_cbow.bin')
# print(word2vec_cbow)

In [ ]:

#SkipGram
start = time.time()
word2vec_skipgram = Word2Vec(sentences,min_count=10, sg=1)
end = time.time()

print("SkipGram Model Training Complete\nTime taken for training is:{:.2f} hrs ".format((end-start)/3600.0))

SkipGram Model Training Complete
Time taken for training is:0.10 hrs

In [ ]:

#Summarize the loaded model
print(word2vec_skipgram)
print("-"*30)

#Summarize vocabulary
words = list(word2vec_skipgram.wv.vocab)
print(f"Length of vocabulary: {len(words)}")
print("Printing the first 30 words.")
print(words[:30])
print("-"*30)

#Acess vector for one word
print(f"Length of vector: {len(word2vec_skipgram['film'])}")
print(word2vec_skipgram['film'])
print("-"*30)

#Compute similarity 
print("Similarity between film and drama:",word2vec_skipgram.similarity('film', 'drama'))
print("Similarity between film and tiger:",word2vec_skipgram.similarity('film', 'tiger'))
print("-"*30)

Word2Vec(vocab=111150, size=100, alpha=0.025)
------------------------------
Length of vocabulary: 111150
Printing the first 30 words.
['the', 'roses', 'registered', 'as', 'is', 'brisbane', 'racing', 'club', 'group', 'thoroughbred', 'horse', 'race', 'for', 'three', 'year', 'old', 'filles', 'run', 'under', 'set', 'weights', 'conditions', 'over', 'distance', 'of', 'metres', 'at', 'racecourse', 'australia', 'during']
------------------------------
Length of vector: 100
[ 1.94889292e-01 -7.88324535e-01  4.66947220e-02  2.57520348e-01
  2.65304267e-01  3.63538593e-01  4.63590741e-01 -1.62654325e-01
  9.11010578e-02 -6.58479631e-02 -6.97350129e-02 -6.56900406e-02
  2.19506964e-01  2.20394313e-01  1.05092540e-01  8.26439075e-03
 -9.39796269e-02  5.50851583e-01  7.65753444e-04 -2.22807571e-01
 -3.17346871e-01  3.20529372e-01  4.51157093e-02 -1.93709806e-01
  2.07626969e-02  1.69344515e-01  2.77250055e-02  1.10369585e-02
 -4.75540310e-01  1.10796697e-01  4.28172469e-01  4.06191871e-02
  5.15495241e-01 -6.85295224e-01 -5.06723702e-01 -4.52192919e-03
  1.51265517e-03 -3.84557724e-01 -2.22782314e-01  5.11201501e-01
  1.42252162e-01 -7.73397386e-01 -2.78606623e-01  4.70017433e-01
 -2.70037323e-01  5.04850507e-01 -1.48356587e-01  2.26073325e-01
 -3.36060971e-01 -1.19667962e-01 -2.59654850e-01 -4.44965392e-01
  1.11614995e-01  1.62986945e-02  4.82374012e-01 -7.87460804e-02
 -1.13825299e-01 -2.24003598e-01  4.93353546e-01 -5.57069406e-02
  2.43176505e-01 -1.84876159e-01  2.13489812e-02  3.42909366e-01
  2.02496469e-01 -4.25657362e-01  8.17572057e-01 -2.83644646e-01
 -5.23434244e-02 -3.27616245e-01  4.43994589e-02 -3.90237272e-01
  2.12029487e-01 -7.25788534e-01  5.52469850e-01 -4.72590374e-03
 -2.02829018e-01 -9.59078223e-03  3.68973225e-01 -2.69762665e-01
 -2.85591751e-01 -2.68359333e-01  3.10093671e-01  2.02198789e-01
  5.80960453e-01 -2.47493789e-01 -7.37856887e-03 -3.59723950e-03
  3.14893663e-01  1.12885557e-01 -5.09416103e-01 -7.58459032e-01
  5.30587435e-01 -1.51896626e-01 -3.37440372e-01  4.22841489e-01
 -3.34523350e-01  3.21759552e-01  7.44457126e-01 -1.26014173e-01]
------------------------------
Similarity between film and drama: 0.63833964
Similarity between film and tiger: 0.22270091
------------------------------

In [ ]:

# save model
word2vec_skipgram.wv.save_word2vec_format('word2vec_sg.bin', binary=True)

# load model
# new_model_skipgram = Word2Vec.load('model_skipgram.bin')
# print(model_skipgram)

FastText¶

In [ ]:

#CBOW
start = time.time()
fasttext_cbow = FastText(sentences, sg=0, min_count=10)
end = time.time()

print("FastText CBOW Model Training Complete\nTime taken for training is:{:.2f} hrs ".format((end-start)/3600.0))

FastText CBOW Model Training Complete
Time taken for training is:0.12 hrs

In [ ]:

#Summarize the loaded model
print(fasttext_cbow)
print("-"*30)

#Summarize vocabulary
words = list(fasttext_cbow.wv.vocab)
print(f"Length of vocabulary: {len(words)}")
print("Printing the first 30 words.")
print(words[:30])
print("-"*30)

#Acess vector for one word
print(f"Length of vector: {len(fasttext_cbow['film'])}")
print(fasttext_cbow['film'])
print("-"*30)

#Compute similarity 
print("Similarity between film and drama:",fasttext_cbow.similarity('film', 'drama'))
print("Similarity between film and tiger:",fasttext_cbow.similarity('film', 'tiger'))
print("-"*30)

FastText(vocab=111150, size=100, alpha=0.025)
------------------------------
Length of vocabulary: 111150
Printing the first 30 words.
['the', 'roses', 'registered', 'as', 'is', 'brisbane', 'racing', 'club', 'group', 'thoroughbred', 'horse', 'race', 'for', 'three', 'year', 'old', 'filles', 'run', 'under', 'set', 'weights', 'conditions', 'over', 'distance', 'of', 'metres', 'at', 'racecourse', 'australia', 'during']
------------------------------
Length of vector: 100
[ 0.47473213  1.6783198  -4.766255   -3.2404876   0.80164665  1.993539
  3.4226568  -0.7035685  -3.0426116   1.5137119   3.8207133   1.3821473
 -0.7379625  -0.6726444   1.8303355  -2.1288188   1.2368282  -3.0745962
  1.4226121  -2.8884995   7.2847705  -1.564321    2.869352    0.6962616
  4.469778    2.5569658   2.621335   -4.612509   -2.2389078   3.6648748
  0.7189718   1.0702186  -3.175641    2.7648733   0.13811935 -2.441776
 -3.9559126  -0.03163956 -1.1257534  -0.64402825 -1.5076644  -0.58919376
 -0.14338583  4.2466817   4.3784213   3.0076942  -5.972965    2.2950342
 -0.50719374 -3.916504   -2.1366098  -2.661619    2.3540869   2.1862476
  5.1004434   4.1282     -4.164653    1.1288711  -4.001655   -4.051289
  2.5718336  -0.40600455  3.8396242   2.214367    1.8413899   4.5216975
 -1.6419586   2.7617378  -2.0902452   2.598776    4.041824   -5.1805005
 -2.777213   -0.02546828 -0.07393612 -3.2800605  -2.9874747  -0.6490991
  3.6039045  -1.4168853   3.6110177  -1.0872458  -0.6365031  -1.0161037
  3.7344344   0.29839793  0.421953   -1.811646    1.3730506   7.575645
  3.3998368   5.0335827  -0.2107324  -2.331183    0.19383769  3.0550041
  4.1529713   3.988616    0.04955976  1.3424706 ]
------------------------------
Similarity between film and drama: 0.5669882
Similarity between film and tiger: 0.24975622
------------------------------

In [ ]:

#SkipGram
start = time.time()
fasttext_skipgram = FastText(sentences, sg=1, min_count=10)
end = time.time()

print("FastText SkipGram Model Training Complete\nTime taken for training is:{:.2f} hrs ".format((end-start)/3600.0))

FastText SkipGram Model Training Complete
Time taken for training is:0.20 hrs

In [ ]:

#Summarize the loaded model
print(fasttext_skipgram)
print("-"*30)

#Summarize vocabulary
words = list(fasttext_skipgram.wv.vocab)
print(f"Length of vocabulary: {len(words)}")
print("Printing the first 30 words.")
print(words[:30])
print("-"*30)

#Acess vector for one word
print(f"Length of vector: {len(fasttext_skipgram['film'])}")
print(fasttext_skipgram['film'])
print("-"*30)

#Compute similarity 
print("Similarity between film and drama:",fasttext_skipgram.similarity('film', 'drama'))
print("Similarity between film and tiger:",fasttext_skipgram.similarity('film', 'tiger'))
print("-"*30)

FastText(vocab=111150, size=100, alpha=0.025)
------------------------------
Length of vocabulary: 111150
Printing the first 30 words.
['the', 'roses', 'registered', 'as', 'is', 'brisbane', 'racing', 'club', 'group', 'thoroughbred', 'horse', 'race', 'for', 'three', 'year', 'old', 'filles', 'run', 'under', 'set', 'weights', 'conditions', 'over', 'distance', 'of', 'metres', 'at', 'racecourse', 'australia', 'during']
------------------------------
Length of vector: 100
[-8.4101312e-02 -6.9478154e-04  3.3954462e-01 -3.6973858e-01
  1.6844368e-01  3.4855682e-01  8.0026442e-01 -5.0405812e-01
 -6.0389137e-01  2.1694953e-02  4.0937051e-01 -3.5893116e-02
 -1.3717794e-01  4.0389201e-01  3.9567137e-01  2.4365921e-01
  5.6551516e-02 -1.5994829e-01 -1.8148309e-01 -2.6480275e-01
 -4.8462763e-01  9.5473409e-02 -1.1126036e-02 -1.8805853e-01
  2.4277805e-01  2.4251699e-01 -1.7501226e-01 -4.3078136e-01
 -3.6442232e-01  9.1702184e-03 -3.2344624e-01 -1.0232232e-01
 -5.2684498e-01 -2.7622378e-01  4.2112619e-01 -4.3196991e-02
  3.1967857e-01  1.7001998e-01  3.3157614e-01 -2.4995559e-01
 -1.3239473e-01 -3.4502399e-01  2.1341468e-01  5.8890671e-01
  1.7721146e-01  1.5974782e-01 -3.8579264e-01 -2.8241745e-01
  6.7402735e-02 -7.1903253e-01  1.3665260e-01 -5.9633050e-02
 -5.9002697e-01 -6.1173952e-01 -1.0246418e-03 -5.1254374e-01
 -1.5101396e-01  1.6967247e-01  2.8125226e-01 -4.6728057e-01
 -5.4966863e-02 -1.3736627e-02 -1.5689149e-01  8.3176725e-02
  1.8850440e-02  4.1858605e-01 -1.1376646e-02 -4.0758383e-02
 -1.7871203e-01  2.7792713e-01  5.5813068e-01 -3.5465869e-01
  1.3662770e-01  2.5777066e-01 -3.0423281e-01  7.8141141e-01
  1.1446947e-02 -4.0541172e-01  2.9406905e-01  6.0151044e-02
  4.9637925e-02 -3.9679220e-01  4.5333567e-01  1.0888510e-02
  2.7147910e-01 -1.7305572e-01 -2.8098795e-01 -6.1907400e-03
 -2.3080334e-01  5.8609635e-01 -1.0097053e-01  6.6119152e-01
  1.8578811e-01 -5.9025098e-02 -5.3886050e-01  2.6664239e-01
 -2.2193529e-02  7.0487672e-01  3.9477929e-01  3.7981489e-01]
------------------------------
Similarity between film and drama: 0.626041
Similarity between film and tiger: 0.27831402
------------------------------

An interesting obeseravtion if you noticed is that CBOW trains faster than SkipGram in both cases. We will leave it to the user to figure out why. A hint would be to refer the working of CBOW and skipgram.