In [1]:

# Reveal.js
from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
        'theme': 'white',
        'transition': 'none',
        'controls': 'false',
        'progress': 'true',
})

Out[1]:

{'theme': 'white',
 'transition': 'none',
 'controls': 'false',
 'progress': 'true'}

In [2]:

%%capture
%load_ext autoreload
%autoreload 2
# %cd ..
import sys
sys.path.append("..")
import statnlpbook.util as util
util.execute_notebook('language_models.ipynb')

In [3]:

%%html
<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

In [4]:

from IPython.display import Image
import random

Contextualised Word Representations¶

What makes a good word representation?¶

Representations are distinct
Similar words have similar representations

What does this mean?¶

"Yesterday I saw a bass ..."

In [5]:

Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)

Out[5]:

In [6]:

Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)

Out[6]:

Contextualised Representations¶

Static embeddings (e.g., word2vec) have one representation per word type, regardless of context
Contextualised representations use the context surrounding the word token

Contextualised Representations Example¶

a) "Yesterday I saw a bass swimming in the lake"

In [7]:

Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)

Out[7]:

b) "Yesterday I saw a bass in the music shop"

In [8]:

Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)

Out[8]:

Contextualised Representations Example¶

a) "Yesterday I saw a bass swimming in the lake".
b) "Yesterday I saw a bass in the music shop".

In [9]:

Image(url='../img/bass_visualisation.jpg'+'?'+str(random.random()), width=500)

Out[9]:

What makes a good representation?¶

Representations are distinct
Similar words have similar representations

Additional criterion:

Representations take context into account

How to train contextualised representations¶

Basicallly like word2vec: predict a word from its context (or vice versa).

Cannot just use lookup table (i.e., embedding matrix) any more.

Train a network with the sequence as input! Does this remind you of anything?

The hidden state of an RNN LM is a contextualised word representation!

In [9]:

Image(url='../img/elmo_1.png'+'?'+str(random.random()), width=800)

Out[9]:

"Let's stick to improvisation in this skit"

Image credit: http://jalammar.github.io/illustrated-bert/

Bidirectional RNN LM¶

An RNN (or LSTM) LM only considers preceding context.

ELMo (Embeddings from Language Models) is based on a biLM: bidirectional language model (Peters et al., 2018).

In [6]:

Image(url='../img/elmo_2.png'+'?'+str(random.random()), width=1200)

Out[6]:

In [10]:

Image(url='../img/elmo_3.png'+'?'+str(random.random()), width=1200)

Out[10]:

ucph.page.link/bilm ¶

(Responses)

Solution¶

To prevent a word from being used to predict itself, while still allowing the model to consider both preceding and following words.

Problem: Long-Term Dependencies¶

LSTMs have longer-term memory, but they still forget.

Solution: transformers! (Vaswani et al. (2017))

In 2022, all state-of-the-art LMs are transformers.
- Yes, also GPT-3

In [15]:

Image(url='../img/transformers.png'+'?'+str(random.random()), width=400)

Out[15]:

Summary¶

Static word embeddings do not differ depending on context
Contextualised representations are dynamic

Additional Reading¶

Jurafsky & Martin Chapter 11

In [ ]:

Contextualised Word Representations¶

What makes a good word representation?¶

What does this mean?¶

Contextualised Representations¶

Contextualised Representations Example¶

Contextualised Representations Example¶

What makes a good representation?¶

How to train contextualised representations¶

Bidirectional RNN LM¶

ucph.page.link/bilm¶

Solution¶

Problem: Long-Term Dependencies¶

Summary¶

Additional Reading¶

ucph.page.link/bilm ¶