1

I have been reading the early paper on pre-training in NLP (https://arxiv.org/abs/1511.01432) and I can't understand what random word dropout means. The authors completely ignore explaining this method as if it was a standard thing. Can someone explain what they really do and what is the purpose of that?

WoofDoggy
  • 353
  • 1
  • 2
  • 11

1 Answers1

3

It is not uncommon that we can make sense of a sentence without reading it completely. Or when you are having a quick look at a document, you tend to oversee some words and still understand the main point. This is the intuition behind the word dropout.

Generally this is done by randomly dropping each word in a sequence following for example a Bernoulli distribution:

$X \leftarrow X \odot \vec{e}, \vec{e} ∼ B(n, p)$

where X is the index of the word token, n is the lenth of the sequence, and $\vec{e}$ is a vector with each word dropout state.

This is usually done after calculating the word embeddings, and the words selected to be left out are normally changed to the <UNK> equivalent embedding.

By doing this, we allow out model to learn more flexible ways of writing/convey meaning.

Tom M.
  • 671
  • 3
  • 9
TitoOrt
  • 1,892
  • 14
  • 23