Highest Voted 'huggingface' Questions - Data Science Stack Exchange

7

votes

1 answer

Minimal working example or tutorial showing how to use Pytorch's nn.TransformerDecoder for batch text generation in training and inference modes?

I want to solve a sequence-to-sequence text generation task (e.g. question answering, language translation, etc.). For the purposes of this question, you may assume that I already have the input part already handled. (I already have a tensor of…

asked Apr 16 '21 at 16:11

Pablo Messina

197
1
3
11

5

votes

1 answer

Running DeepSeek-V3 inference without GPU (on CPU only)

I am trying to run the DeepSeek-V3 model inference on a remote machine (SSH). This machine does not have any GPU, but has many CPU cores. 1rst method/ I try to run the model inference using the DeepSeek-Infer Demo method: generate.py --ckpt-path…

python llm huggingface

asked Apr 15 '25 at 18:49

The_Average_Engineer

153
4

4

votes

1 answer

Bert for QuestionAnswering input exceeds 512

I'm training Bert on question answering (in Spanish) and i have a large context, only the context exceeds 512, the total question + context is 10k, i found that longformer is bert like for long document, but there's no pretrained in spanish so, is…

bert transformer question-answering huggingface

asked Sep 14 '20 at 12:59

Sadak

252
3
10

4

votes

1 answer

Since LoRA parameters are randomly initialized, shouldn't that mean that initially breaks a models output?

I have just tried using LoRA on Llama 3 8B and I found without doing any fine tuning it performed pretty well on my dataset. But then I realized that surely the LoRA parameters are randomly initialized right? So if that's the case, shouldn't that…

huggingface finetuning

asked Apr 30 '24 at 09:46

Ameen Izhac

107
6

4

votes

1 answer

How do I get model.generate() to omit the input sequence from the generation?

I'm using Huggingface to do inference on llama-3-B. Here is my model: model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit", max_seq_length = 2048, dtype = torch.float16, …

nlp pytorch huggingface

asked Apr 26 '24 at 15:06

Ameen Izhac

107
6

4

votes

2 answers

Dynamic batching and padding batches for NLP in deep learning libraries

This is the usual way we train modern deep learning models for NLP, e.g. with Huggingface libraries where we have a fix length for the input no. of tokens/subwoords unit. https://huggingface.co/docs/transformers/pad_truncation In the follow example,…

nlp pytorch huggingface

asked Apr 07 '23 at 12:05

alvas

2,510
7
28
40

4

votes

2 answers

How to use is_split_into_words with Huggingface NER pipeline

I am using Huggingface transformers for NER, following this excellent guide: https://huggingface.co/blog/how-to-train. My incoming text has already been split into words. When tokenizing during training/fine-tuning I can use…

transformer named-entity-recognition huggingface

asked Sep 15 '21 at 06:44

Alan Buxton

143
5

4

votes

1 answer

How to measure the accuracy of an NLP paraphrasing model?

I using the HuggingFace library to do sentence paraphrasing (given an input sentence, the model outputs a paraphrase). How am I supposed to compare the results of two separate models (one trained with t5-base, the other with t5-small) for this task?…

deep-learning nlp model-evaluations huggingface

asked Sep 04 '21 at 04:15

carrot_142

63
3

3

votes

1 answer

How to i get word embeddings for out of vocabulary words using a transformer model?

When i tried to get word embeddings of a sentence using bio_clinical bert, for a sentence of 8 words i am getting 11 token ids(+start and end) because "embeddings" is an out of vocabulary word/token, that is being split into em,bed,ding,s. I would…

nlp transformer stanford-nlp tokenization huggingface

asked Jan 13 '21 at 07:02

cerofrais

131
4

3

votes

0 answers

Loss while fine tuning a transformer based pose estimation model not reducing

I am trying to fine-tune a transformer/encoder based pose estimation model available here at: https://huggingface.co/docs/transformers/en/model_doc/vitpose When passing "labels" attribute to the forward pass of the model, the model returns "Training…

pytorch convolutional-neural-network transformer huggingface loss

asked May 10 '25 at 09:13

Soham Bhaumik

131
1

3

votes

1 answer

LMM Fine Tuning - Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer

When should one opt for the Supervised Fine Tuning Trainer (SFTTrainer) instead of the regular Transformers Trainer when it comes to instruction fine-tuning for Language Models (LLMs)? From what I gather, the regular Transformers Trainer typically…

deep-learning transformer language-model huggingface finetuning

asked Jun 14 '23 at 15:54

Marvin Martin

33
1
3

3

votes

1 answer

Fine-tuning a pre-trained LLM for question-answering

Objective My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with questions such as "How can MU improve?", or "What are…

transformer language-model huggingface text-generation finetuning

asked May 31 '23 at 12:56

Tom Bomer

33
1
5

2

votes

1 answer

HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

I am a HuggingFace Newbie and I am fine-tuning a BERT model (distilbert-base-cased) using the Transformers library but the training loss is not going down, instead I am getting loss: nan - accuracy: 0.0000e+00. My code is largely per the boiler…

nlp bert huggingface loss

asked Aug 06 '21 at 19:23

JasonExcel

21
1
2

2

votes

1 answer

Same Processing Time for Prompts of Different SIze

I'm not a Data Scientist, so bare with me please. I have a Google Gemma 3 27B-it LLM running on a HuggingFace Inference endpoint in AWS on a machine with an A100 GPU. The endpoint is configured to run a Text Generation task on a vLLM container. I…

huggingface

asked Jun 18 '25 at 20:54

Michael

123
2

2

votes

2 answers

Should you care about truncation and padding in an LLM even if it has a very large tokenizer.max_length so that truncation will never happen?

I want to find out the role of truncation and padding in Huggingface Transformers pretrained models and/or any fine-tuning models on top. Taking a large language model like the German GPT2 shows that the max_length is very large so that truncation…

huggingface finetuning llm parameter

asked Jan 12 '24 at 10:27

questionto42

215
1
10

Questions tagged [huggingface]