Requirements for variable length output in transformer

Question

I have been working on modifying the transformer from the article The Annotated Transformer. One of the features I would like to include is the ability to pass a sequence of fixed length, and receive an output sequence of a shorter length, which is possible per this reference.

In my case, I am using a sequence of 10 randomly generated integers 0-9 for the input (just like the article) and trying to return a sequence of five 2s (this is the simplest attempt to get an output of a shorter length I could think of). The start of the sequence is denoted as 1, and the end of the sequence is not defined.

I am successfully able to send the encoder the "source" batch tensor, and the decoder the "target" batch tensor consisting of only 5 columns in the batch tensor. The transformer will train on this data, but it returns a sequence of length equal to the input.

What are the requirements of the transformer network to output a sequence of length that is not equal to the length of the input sequence?

Thanks in advance for any assistance

score 3 · Accepted Answer · answered Jan 22 '23 at 17:13

The Transformer has the inherent ability to generate variable-length sequences, you don't need to do anything special. The output of the Transformer decoder is always the same length as the input of the decoder.

When using a Transformer at training time, we pass the whole target sequence (shifted one position to the right due to the addition of the "beginning of sequence" token at the beginning and the removal of the "end of sequence" token), so the output of the decoder has the same length as the target sequence.

When using a Transformer at inference time, we start by providing as input only the "beginning of sequence" token, and we obtain an output of length 1. We create a new input, attaching the previous output to the input of the previous iteration, and repeat the process. We do this until it generates the "end of sequence" token. In all cases, the output of the decoder is the same length as the input of the decoder.

Note that you need EOS tokens to mark the end of the sequence, both for the source and target sequences. Both during training and inference. When preparing the input data for training, you remove the EOS token from the input target sequence, and prepend a BOS token to it.

Requirements for variable length output in transformer

1 Answers1

Linked