Questions tagged [finetuning]
85 questions
16
votes
2 answers
What are the good parameter ranges for BERT hyperparameters while finetuning it on a very small dataset?
I need to finetune BERT model (from the huggingface repository) on a sentence classification task. However, my dataset is really small.I have 12K sentences and only 10% of them are from positive classes. Does anyone here have any experience on…
zwlayer
- 279
- 1
- 2
- 8
6
votes
5 answers
Understanding alpha parameter tuning in LORA paper
I was reading the LORA paper https://arxiv.org/pdf/2106.09685.pdf a thing I don’t understand is section 4.1, where the updates are updated by alpha, where alpha is a constant in r. It is said that alpha is set to the first r tried. Then if I…
jpotwor
- 61
- 1
- 1
- 2
4
votes
1 answer
Which is the fastest image pretrained model?
I had been working with pre-trained models and was just curious to know the fastest forward propagating model of all the computer vision pre-trained models. I have been trying to achieve faster processing in one-shot learning and have tried the…
thanatoz
- 2,495
- 4
- 20
- 41
4
votes
1 answer
Since LoRA parameters are randomly initialized, shouldn't that mean that initially breaks a models output?
I have just tried using LoRA on Llama 3 8B and I found without doing any fine tuning it performed pretty well on my dataset. But then I realized that surely the LoRA parameters are randomly initialized right? So if that's the case, shouldn't that…
Ameen Izhac
- 107
- 6
3
votes
1 answer
How to combine different models in Keras?
I have a pre-trained network, consist of two parts, the feature extraction, and the similarity learning.
The network takes two inputs and predicts the images are same or not.
The feature extraction part was VGGNet 16 with all layers freezed.
I only…
Ali Raza Memon
- 113
- 1
- 8
3
votes
1 answer
LMM Fine Tuning - Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer
When should one opt for the Supervised Fine Tuning Trainer (SFTTrainer) instead of the regular Transformers Trainer when it comes to instruction fine-tuning for Language Models (LLMs)? From what I gather, the regular Transformers Trainer typically…
Marvin Martin
- 33
- 1
- 3
3
votes
1 answer
Fine-tuning a pre-trained LLM for question-answering
Objective
My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with questions such as "How can MU improve?", or "What are…
Tom Bomer
- 33
- 1
- 5
3
votes
0 answers
What is zero-shot vs one-short vs few-shot learning?
Are there any papers/research work that deals with generalizing the matrix of how the *-shot(s) learning are defined?
There's a wide variety of papers that titled themselves as *-shot(s) learning, with some variants of how *-shots are defined,…
alvas
- 2,510
- 7
- 28
- 40
2
votes
1 answer
Combining textual and numeric features into pre-trained Transformer BERT
I have a dataset with 3 columns:
Text
Meta-data (intending to extract features from it, then use those i.e., numerical features)
Target label
Question 1: How can I use a pre-trained BERT instance on more than the text?
One theoretical solution…
George Petropoulos
- 123
- 1
- 5
2
votes
1 answer
Why not using linear regression for finetuning the last layer of a neural network?
In transfer learning, often only the last layer of the network is retrained using gradient descent.
However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or…
Funkwecker
- 635
- 1
- 6
- 13
2
votes
2 answers
Does finetuning BERT involving updating all of the parameters or just the final classification layer?
Currently learning and reading about transformer models, I get that during the pretraining stage the BERT model is trained on a large corpus via MLM and NSP. But during finetuning, for example trying to classify sentiment based on another text, are…
spnc
- 21
- 2
2
votes
0 answers
How does fine tuning actually work?
So i’m currently fine tuning a pretrained model with 35k images across 5 classes. Very high class imbalance with one being 73% across the distribution.
Handled this with by using a weighted loss function with…
Abas1
- 21
- 2
2
votes
0 answers
Overfitting problem of convolutional neural network; how to fix?
So I've been working on this convolutional neural network but my accuracy is stuck at 62% without improving and I'm afraid I'm in rather severe situation with the overfitting issue. I've been trying to play around with the weight decay and learning…
user30246218
- 21
- 1
2
votes
0 answers
Transfer Learning for NER
Does anyone have experience with fine-tuning an NER model on out-of-domain data? Did the fine-tuning succeed, and approximately how much data was needed?
To clarify, e.g., starting with a NER model to identify persons and locations, and fine-tuning…
hjerp
- 41
- 3
2
votes
2 answers
Should you care about truncation and padding in an LLM even if it has a very large tokenizer.max_length so that truncation will never happen?
I want to find out the role of truncation and padding in Huggingface Transformers pretrained models and/or any fine-tuning models on top. Taking a large language model like the German GPT2 shows that the max_length is very large so that truncation…
questionto42
- 215
- 1
- 10