3

For learning purposes, I have tried to train a text generation model at a tiny scale in this notebook using RNN/LSTM model. But I am not able to take it further to use transformer model. Can anyone help me understand if I can use this data setup for transformer model and how?

Here is an explanation of my code:

  • I am using a sample of 5000 India news headlines
  • Tokenized the headlines text using a tokenizer with small vocabulary
  • Split each tokenized sentence into sequences of length 2 to 10
  • Split each sequence by setting last token to y and rest of the tokens to X
  • Padded each X on the left side to create X1,...,X9 (i.e. for an X with just 3 tokens, X1 to X6 are padded and X7 to X9 are assigned to the respective tokens).

Here is an example - Sentence: "You are good". This will be split into "You are", "are good" and "You are good". So, record1 = [pad, pad, pad, pad, pad, pad, pad, You, are], record2 = [pad, pad, pad, pad, pad, pad, pad, are, good], record3 = [pad, pad, pad, pad, pad, pad, You, are, good]

  • Split X and y to train-test and loaded them into pytorch DataLoader() object with training batch size 32
  • Defined the model class Net() having X1 to X9 passing through an Embedding layer, followed by LSTM layer, followed by Linear layer.
  • Minimized cross entropy loss by feeding in the final layer output (X) vs actual next token (y).

Problem - When I try to convert this code for Transformer, I am not getting a single reference example where the data for Transformer class is setup this way.

0 Answers0