from fastai.gen_doc.nbdoc import *
from fastai.text.models import *
from fastai import *
This module fully implements the AWD-LSTM from Stephen Merity et al. The main idea of the article is to use a RNN with dropout everywhere, but in an intelligent way. There is a difference with the usual dropout, which is why you’ll see a RNNDropout
module: we zero things, as is usual in dropout, but we always zero the same thing according to the sequence dimension (which is the first dimension in pytorch). This ensures consistency when updating the hidden state through the whole sentences/articles.
This being given, there are five different dropouts in the AWD-LSTM:
show_doc(get_language_model, doc_string=False)
Creates an AWD-LSTM with a first embedding of vocab_sz
by emb_sz
, a hidden size of n_hid
, RNNs with n_layers
that can be bidirectional if bidir
is True. The last RNN as an output size of emb_sz
so that we can use the same decoder as the encoder if tie_weights
is True. The decoder is a Linear
layer with or without bias
. If qrnn
is set to True, we use [QRNN cells] instead of LSTMS. pad_token
is the token used for padding.
embed_p
is used for the embedding dropout, input_p
is used for the input dropout, weight_p
is used for the weight dropout, hidden_p
is used for the hidden dropout and output_p
is used for the output dropout.
Note that the model returns a list of three things, the actual output being the first, the two others being the intermediate hidden states before and after dropout (used by the RNNTrainer
). Most loss functions expect one output, so you should use a Callback to remove the other two if you're not using RNNTrainer
.
show_doc(get_rnn_classifier, doc_string=False)
get_rnn_classifier
[source]
get_rnn_classifier
(bptt
:int
,max_seq
:int
,n_class
:int
,vocab_sz
:int
,emb_sz
:int
,n_hid
:int
,n_layers
:int
,pad_token
:int
,layers
:Collection
[int
],drops
:Collection
[float
],bidir
:bool
=False
,qrnn
:bool
=False
,hidden_p
:float
=0.2
,input_p
:float
=0.6
,embed_p
:float
=0.1
,weight_p
:float
=0.5
) →Module
Creates a RNN classifier with a encoder taken from an AWD-LSTM with arguments vocab_sz
, emb_sz
, n_hid
, n_layers
, bias
, bidir
, qrnn
, pad_token
and the dropouts parameters. This encoder is fed the sequence by successive bits of size bptt
and we only keep the last max_seq
outputs for the pooling layers.
The decoder use a concatenation of the last outputs, a MaxPooling
of all the ouputs and an AveragePooling
of all the outputs. It then uses a list of BatchNorm
, Dropout
, Linear
, ReLU
blocks (with no ReLU
in the last one), using a first layer size of 3*emb_sz
then follwoing the numbers in n_layers
to stop at n_class
. The dropouts probabilities are read in drops
.
Note that the model returns a list of three things, the actual output being the first, the two others being the intermediate hidden states before and after dropout (used by the RNNTrainer
). Most loss functions expect one output, so you should use a Callback to remove the other two if you're not using RNNTrainer
.
On top of the pytorch or the fastai layers
, the language models use some custom layers specific to NLP.
show_doc(EmbeddingDropout, doc_string=False, title_level=3)
Applies a dropout with probability embed_p
to an embedding layer emb
in training mode. Each row of the embedding matrix has a probability embed_p
of being replaced by zeros while the others are rescaled accordingly.
enc = nn.Embedding(100, 7, padding_idx=1)
enc_dp = EmbeddingDropout(enc, 0.5)
tst_input = torch.randint(0,100,(8,))
enc_dp(tst_input)
tensor([[ 0.0000, -0.0000, -0.0000, 0.0000, 0.0000, 0.0000, -0.0000], [ 0.0000, 0.0000, -0.0000, 0.0000, 0.0000, -0.0000, -0.0000], [-0.0000, -0.0000, 0.0000, -0.0000, -0.0000, -0.0000, -0.0000], [ 0.0000, -0.0000, -0.0000, -0.0000, 0.0000, 0.0000, 0.0000], [ 0.2932, 2.0022, 2.1872, -0.3247, 0.1347, -0.3324, -1.3978], [ 1.4960, -2.5978, 1.5589, 0.9840, -1.5260, -2.4613, 0.4806], [-0.0000, 0.0000, -0.0000, 0.0000, 0.0000, 0.0000, -0.0000], [ 0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000]], grad_fn=<EmbeddingBackward>)
show_doc(RNNDropout, doc_string=False, title_level=3)
Applies a dropout with probability p
consistently over the first dimension in training mode.
dp = RNNDropout(0.3)
tst_input = torch.randn(3,3,7)
tst_input, dp(tst_input)
(tensor([[[ 1.2319, 1.1261, 1.2774, 0.1549, -1.1483, 1.0135, -0.5733], [ 0.3503, 1.6554, -0.3416, 0.1143, -1.6186, 0.1263, 0.6576], [-0.1282, -1.4898, 1.3864, 0.8228, -1.3303, 2.0144, 0.1165]], [[-0.7594, 0.3570, 0.2195, 0.0835, 0.4086, -0.2475, 0.5885], [ 0.0940, 0.1063, 0.4301, 0.4235, 0.3187, 0.2077, 1.3733], [ 1.1039, 1.0182, 0.2202, 0.6540, -1.0580, -0.1514, 1.1673]], [[ 0.7464, -1.1539, -0.1214, -0.0774, 0.1987, -0.4181, 0.0653], [ 1.0115, 2.2871, -0.6750, 0.6190, 0.5913, 0.6784, -0.2695], [ 0.7146, 0.4232, -1.9684, -0.2852, -0.1162, 0.2386, 0.7550]]]), tensor([[[ 1.7598, 0.0000, 0.0000, 0.2213, -1.6404, 1.4479, -0.8190], [ 0.5004, 2.3649, -0.4880, 0.1633, -0.0000, 0.1805, 0.0000], [-0.1832, -0.0000, 0.0000, 1.1754, -1.9005, 2.8777, 0.1665]], [[-1.0849, 0.0000, 0.0000, 0.1192, 0.5837, -0.3536, 0.8407], [ 0.1342, 0.1519, 0.6144, 0.6050, 0.0000, 0.2967, 0.0000], [ 1.5770, 0.0000, 0.0000, 0.9343, -1.5114, -0.2163, 1.6675]], [[ 1.0663, -0.0000, -0.0000, -0.1106, 0.2839, -0.5973, 0.0933], [ 1.4450, 3.2672, -0.9642, 0.8842, 0.0000, 0.9691, -0.0000], [ 1.0208, 0.0000, -0.0000, -0.4074, -0.1660, 0.3408, 1.0786]]]))
show_doc(WeightDropout, doc_string=False, title_level=3)
Applies dropout of probability weight_p
to the layers in layer_names
of module
in training mode. A copy of those weights is kept so that the dropout mask can change at every batch.
module = nn.LSTM(5, 2)
dp_module = WeightDropout(module, 0.4)
getattr(dp_module.module, 'weight_hh_l0')
Parameter containing: tensor([[-0.6580, -0.1605], [ 0.3274, -0.1130], [-0.4807, -0.4852], [ 0.2366, -0.4500], [ 0.0782, 0.1738], [ 0.1071, -0.2037], [-0.5886, 0.5423], [ 0.6924, -0.6779]], requires_grad=True)
It's at the beginning of a forward pass that the dropout is applied to the weights.
tst_input = torch.randn(4,20,5)
h = (torch.zeros(1,20,2), torch.zeros(1,20,2))
x,h = dp_module(tst_input,h)
getattr(dp_module.module, 'weight_hh_l0')
tensor([[-1.0966, -0.0000], [ 0.5457, -0.0000], [-0.0000, -0.8087], [ 0.3944, -0.0000], [ 0.1303, 0.2897], [ 0.1785, -0.0000], [-0.0000, 0.0000], [ 1.1541, -1.1298]], grad_fn=<MulBackward0>)
show_doc(SequentialRNN, doc_string=False, title_level=3)
class
SequentialRNN
[source]
SequentialRNN
(args
) ::Sequential
Create a Sequentiall
module with args
that has a reset
function.
show_doc(SequentialRNN.reset)
reset
[source]
reset
()
Call the reset
function of self.children
(if they have one).
show_doc(dropout_mask, doc_string=False)
dropout_mask
[source]
dropout_mask
(x
:Tensor
,sz
:Collection
[int
],p
:float
)
Create a dropout mask of size sz
, the same type as x
and probability p
.
tst_input = torch.randn(3,3,7)
dropout_mask(tst_input, (3,7), 0.3)
tensor([[0.0000, 1.4286, 1.4286, 1.4286, 1.4286, 1.4286, 0.0000], [0.0000, 1.4286, 1.4286, 1.4286, 1.4286, 0.0000, 1.4286], [1.4286, 1.4286, 0.0000, 1.4286, 1.4286, 0.0000, 0.0000]])
Such a mask is then expanded in the sequence length dimension and multiplied by the input to do an RNNDropout
.
show_doc(RNNCore, doc_string=False, title_level=3)
Create an AWD-LSTM encoder with an embedding layer of vocab_sz
by emb_sz
, a hidden size of n_hid
, n_layers
layers. pad_token
is passed to the Embedding
, if bidir
is True, the model is bidirectional. If qrnn
is True, we use QRNN cells instead of LSTMs. Dropouts are embed_p
, input_p
, weight_p
and hidden_p
.
show_doc(RNNCore.reset)
show_doc(LinearDecoder, doc_string=False, title_level=3)
Create a the decoder to go on top of an RNNCore
encoder and create a language model. n_hid
is the dimension of the last hidden state of the encoder, n_out
the size of the output. Dropout of output_p
is applied. If a tie_encoder
is passed, it will be used for the weights of the linear layer, that will have bias
or not.
show_doc(MultiBatchRNNCore, doc_string=False, title_level=3)
show_doc(MultiBatchRNNCore.concat)
concat
[source]
concat
(arrs
:Collection
[Tensor
]) →Tensor
Concatenate the arrs
along the batch dimension.
show_doc(PoolingLinearClassifier, doc_string=False, title_level=3)
Create a linear classifier that sits on an RNNCore
encoder. The last output, MaxPooling
of all the outputs and AvgPooling
of all the outputs are concatenated, then blocks of bn_drop_lin
are stacked, according to the values in layers
and drops
.
show_doc(PoolingLinearClassifier.pool, doc_string=False)
pool
[source]
pool
(x
:Tensor
,bs
:int
,is_max
:bool
)
Pool x
(of batch size bs
) along the batch dimension. is_max
decides if we do an AvgPooling
or a MaxPooling
.
show_doc(WeightDropout.forward)
forward
[source]
forward
(args
:ArgStar
)
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(RNNCore.forward)
forward
[source]
forward
(input
:LongTensor
) →Tuple
[Tensor
,Tensor
]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(EmbeddingDropout.forward)
forward
[source]
forward
(words
:LongTensor
,scale
:Optional
[float
]=None
) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(RNNDropout.forward)
forward
[source]
forward
(x
:Tensor
) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(PoolingLinearClassifier.forward)
forward
[source]
forward
(input
:Tuple
[Tensor
,Tensor
]) →Tuple
[Tensor
,Tensor
,Tensor
]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(MultiBatchRNNCore.forward)
forward
[source]
forward
(input
:LongTensor
) →Tuple
[Tensor
,Tensor
]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(WeightDropout.reset)
reset
[source]
reset
()
show_doc(LinearDecoder.forward)
forward
[source]
forward
(input
:Tuple
[Tensor
,Tensor
]) →Tuple
[Tensor
,Tensor
,Tensor
]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.