1

I'm trying to write a program that using Roberta to calculate word embeddings:

from transformers import RobertaModel, RobertaTokenizer
import torch

model = RobertaModel.from_pretrained('roberta-base') tokenizer = RobertaTokenizer.from_pretrained('roberta-base') caption = "this bird is yellow has red wings"

encoded_caption = tokenizer(caption, return_tensors='pt') input_ids = encoded_caption['input_ids']

outputs = model(input_ids) word_embeddings = outputs.last_hidden_state

I extract the last hidden state after forwarding the input_ids to the RobertaModel class to calculate word embeddings, I don't know if this is the correct way to do this, can anyone help me confirm this ? Thanks

1 Answers1

1

This was studied in the original BERT article, which concluded that the best approach was to concatenate the states of the last 4 layers:

enter image description here

Although BERT preceeded RoBERTa, we may understand this observation to be somewhat applicable to RoBERTa, which is very similar. You may, nonetheless, experiment with the precise number of layer states to concatenate to see what value gives the best results.

noe
  • 28,203
  • 1
  • 49
  • 83