Highest Voted 'finetuning' Questions - Data Science Stack Exchange

16

votes

2 answers

What are the good parameter ranges for BERT hyperparameters while finetuning it on a very small dataset?

I need to finetune BERT model (from the huggingface repository) on a sentence classification task. However, my dataset is really small.I have 12K sentences and only 10% of them are from positive classes. Does anyone here have any experience on…

deep-learning bert finetuning

asked Dec 10 '19 at 18:31

zwlayer

279
1
2
8

6

votes

5 answers

Understanding alpha parameter tuning in LORA paper

I was reading the LORA paper https://arxiv.org/pdf/2106.09685.pdf a thing I don’t understand is section 4.1, where the updates are updated by alpha, where alpha is a constant in r. It is said that alpha is set to the first r tried. Then if I…

transformer finetuning llm

asked Aug 14 '23 at 08:52

jpotwor

61
1
1
2

4

votes

1 answer

Which is the fastest image pretrained model?

I had been working with pre-trained models and was just curious to know the fastest forward propagating model of all the computer vision pre-trained models. I have been trying to achieve faster processing in one-shot learning and have tried the…

deep-learning computer-vision transfer-learning inception finetuning

asked Oct 04 '18 at 10:20

thanatoz

2,495
4
20
41

4

votes

1 answer

Since LoRA parameters are randomly initialized, shouldn't that mean that initially breaks a models output?

I have just tried using LoRA on Llama 3 8B and I found without doing any fine tuning it performed pretty well on my dataset. But then I realized that surely the LoRA parameters are randomly initialized right? So if that's the case, shouldn't that…

huggingface finetuning

asked Apr 30 '24 at 09:46

Ameen Izhac

107
6

3

votes

1 answer

How to combine different models in Keras?

I have a pre-trained network, consist of two parts, the feature extraction, and the similarity learning. The network takes two inputs and predicts the images are same or not. The feature extraction part was VGGNet 16 with all layers freezed. I only…

keras convolutional-neural-network vgg16 siamese-networks finetuning

asked Feb 20 '20 at 06:54

Ali Raza Memon

113
1
8

3

votes

1 answer

LMM Fine Tuning - Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer

When should one opt for the Supervised Fine Tuning Trainer (SFTTrainer) instead of the regular Transformers Trainer when it comes to instruction fine-tuning for Language Models (LLMs)? From what I gather, the regular Transformers Trainer typically…

deep-learning transformer language-model huggingface finetuning

asked Jun 14 '23 at 15:54

Marvin Martin

33
1
3

3

votes

1 answer

Fine-tuning a pre-trained LLM for question-answering

Objective My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with questions such as "How can MU improve?", or "What are…

transformer language-model huggingface text-generation finetuning

asked May 31 '23 at 12:56

Tom Bomer

33
1
5

3

votes

0 answers

What is zero-shot vs one-short vs few-shot learning?

Are there any papers/research work that deals with generalizing the matrix of how the *-shot(s) learning are defined? There's a wide variety of papers that titled themselves as *-shot(s) learning, with some variants of how *-shots are defined,…

finetuning one-shot-learning few-shot-learning zero-shot-learning

asked Apr 02 '23 at 06:27

alvas

2,510
7
28
40

2

votes

1 answer

Combining textual and numeric features into pre-trained Transformer BERT

I have a dataset with 3 columns: Text Meta-data (intending to extract features from it, then use those i.e., numerical features) Target label Question 1: How can I use a pre-trained BERT instance on more than the text? One theoretical solution…

nlp pytorch bert finetuning

asked Apr 23 '21 at 11:07

George Petropoulos

123
1
5

2

votes

1 answer

Why not using linear regression for finetuning the last layer of a neural network?

In transfer learning, often only the last layer of the network is retrained using gradient descent. However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or…

neural-network linear-regression transfer-learning finetuning

asked Apr 02 '21 at 08:35

Funkwecker

635
1
6
13

2

votes

2 answers

Does finetuning BERT involving updating all of the parameters or just the final classification layer?

Currently learning and reading about transformer models, I get that during the pretraining stage the BERT model is trained on a large corpus via MLM and NSP. But during finetuning, for example trying to classify sentiment based on another text, are…

nlp bert transformer finetuning pretraining

asked Sep 04 '20 at 20:54

spnc

21
2

2

votes

0 answers

How does fine tuning actually work?

So i’m currently fine tuning a pretrained model with 35k images across 5 classes. Very high class imbalance with one being 73% across the distribution. Handled this with by using a weighted loss function with…

machine-learning deep-learning neural-network transfer-learning finetuning

asked Apr 25 '25 at 08:23

Abas1

21
2

2

votes

0 answers

Overfitting problem of convolutional neural network; how to fix?

So I've been working on this convolutional neural network but my accuracy is stuck at 62% without improving and I'm afraid I'm in rather severe situation with the overfitting issue. I've been trying to play around with the weight decay and learning…

cnn convolutional-neural-network accuracy overfitting finetuning

asked Apr 11 '25 at 18:35

user30246218

21
1

2

votes

0 answers

Transfer Learning for NER

Does anyone have experience with fine-tuning an NER model on out-of-domain data? Did the fine-tuning succeed, and approximately how much data was needed? To clarify, e.g., starting with a NER model to identify persons and locations, and fine-tuning…

neural-network dataset transfer-learning named-entity-recognition finetuning

asked Feb 08 '25 at 14:19

hjerp

41
3

2

votes

2 answers

Should you care about truncation and padding in an LLM even if it has a very large tokenizer.max_length so that truncation will never happen?

I want to find out the role of truncation and padding in Huggingface Transformers pretrained models and/or any fine-tuning models on top. Taking a large language model like the German GPT2 shows that the max_length is very large so that truncation…

huggingface finetuning llm parameter

asked Jan 12 '24 at 10:27

questionto42

215
1
10

Questions tagged [finetuning]