Is this the correct way to calculate word embeddings using Roberta?

Question

I'm trying to write a program that using Roberta to calculate word embeddings:

from transformers import RobertaModel, RobertaTokenizer
import torch
model = RobertaModel.from_pretrained('roberta-base')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
caption = "this bird is yellow has red wings"
encoded_caption = tokenizer(caption, return_tensors='pt')
input_ids = encoded_caption['input_ids']
outputs = model(input_ids)
word_embeddings = outputs.last_hidden_state

I extract the last hidden state after forwarding the input_ids to the RobertaModel class to calculate word embeddings, I don't know if this is the correct way to do this, can anyone help me confirm this ? Thanks

score 1 · Accepted Answer · answered Jan 12 '24 at 19:57

This was studied in the original BERT article, which concluded that the best approach was to concatenate the states of the last 4 layers:

Although BERT preceeded RoBERTa, we may understand this observation to be somewhat applicable to RoBERTa, which is very similar. You may, nonetheless, experiment with the precise number of layer states to concatenate to see what value gives the best results.

Is this the correct way to calculate word embeddings using Roberta?

1 Answers1