How to train next token prediction text generation model using Pytorch Transformer classes?

Question

For learning purposes, I have tried to train a text generation model at a tiny scale in this notebook using RNN/LSTM model. But I am not able to take it further to use transformer model. Can anyone help me understand if I can use this data setup for transformer model and how?

Here is an explanation of my code:

I am using a sample of 5000 India news headlines
Tokenized the headlines text using a tokenizer with small vocabulary
Split each tokenized sentence into sequences of length 2 to 10
Split each sequence by setting last token to y and rest of the tokens to X
Padded each X on the left side to create X1,...,X9 (i.e. for an X with just 3 tokens, X1 to X6 are padded and X7 to X9 are assigned to the respective tokens).

Here is an example - Sentence: "You are good". This will be split into "You are", "are good" and "You are good". So, record1 = [pad, pad, pad, pad, pad, pad, pad, You, are], record2 = [pad, pad, pad, pad, pad, pad, pad, are, good], record3 = [pad, pad, pad, pad, pad, pad, You, are, good]

Split X and y to train-test and loaded them into pytorch DataLoader() object with training batch size 32
Defined the model class Net() having X1 to X9 passing through an Embedding layer, followed by LSTM layer, followed by Linear layer.
Minimized cross entropy loss by feeding in the final layer output (X) vs actual next token (y).

Problem - When I try to convert this code for Transformer, I am not getting a single reference example where the data for Transformer class is setup this way.

How to train next token prediction text generation model using Pytorch Transformer classes?

0 Answers0