# Reveal.js
from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
'theme': 'white',
'transition': 'none',
'controls': 'false',
'progress': 'true',
})
{'theme': 'white', 'transition': 'none', 'controls': 'false', 'progress': 'true'}
%%capture
%load_ext autoreload
%autoreload 2
# %cd ..
import sys
sys.path.append("..")
import statnlpbook.util as util
util.execute_notebook('language_models.ipynb')
%%html
<script>
function code_toggle() {
if (code_shown){
$('div.input').hide('500');
$('#toggleButton').val('Show Code')
} else {
$('div.input').show('500');
$('#toggleButton').val('Hide Code')
}
code_shown = !code_shown
}
$( document ).ready(function(){
code_shown=false;
$('div.input').hide()
});
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>
from IPython.display import Image
import random
Image(url='mt_figures/transformer.png'+'?'+str(random.random()), width=500)
Predict masked words given context on both sides:
Conditional encoding of both sentences:
Transformer with $L$ layers of dimension $H$, and $A$ self-attention heads.
(Many other variations available through HuggingFace Transformers)
Trained on 16GB of text from Wikipedia + BookCorpus.
Model | Accuracy |
---|---|
LSTM | 77.6 |
LSTMs with conditional encoding | 80.9 |
LSTMs with conditional encoding + attention | 82.3 |
LSTMs with word-by-word attention | 83.5 |
Self-attention | 85.6 |
BERT$_\mathrm{BASE}$ | 89.2 |
BERT$_\mathrm{LARGE}$ | 90.4 |
Same architecture as BERT but better hyperparameter tuning and more training data (Liu et al., 2019):
and no next-sentence-prediction task (only masked LM).
Training: 1024 GPUs for one day.
Model | Accuracy |
---|---|
LSTM | 77.6 |
LSTMs with conditional encoding | 80.9 |
LSTMs with conditional encoding + attention | 82.3 |
LSTMs with word-by-word attention | 83.5 |
Self-attention | 85.6 |
BERT$_\mathrm{BASE}$ | 89.2 |
BERT$_\mathrm{LARGE}$ | 90.4 |
RoBERTa$_\mathrm{BASE}$ | 90.7 |
RoBERTa$_\mathrm{LARGE}$ | 91.4 |
WordPiece and BPE (byte-pair encoding) tokenise text to subwords (Sennrich et al., 2016, Wu et al., 2016)
https://github.com/google-research/bert/blob/master/multilingual.md
(CamemBERT, BERTje, Nordic BERT...)