Questions tagged [sgd]

Stochastic Gradient Descent (SGD) is an iterative algorithms used for objective function optimization used in machine learning models.

Stochastic Gradient Descent (SGD) is an iterative algorithms used for objective function optimization used in machine learning models. It is often considered a stochastic approximation approach to the Gradient Descent algorithm that is commonly used in machine learning optimization problems.

21 questions

votes

1 answer

Changing the batch size during training

The choice of batch size is in some sense the measure of stochasticity : On one hand, smaller batch sizes make the gradient descent more stochastic, the SGD can deviate significantly from the exact GD on the whole data, but allows for more…

asked Jan 29 '21 at 08:23

spiridon_the_sun_rotator

votes

2 answers

How is Stochastic Gradient Descent(SGD) used like Mini Batch Gradient Descent(MBGD)?

As I know, Gradient Descent(GD) has three variants which are: 1- Batch Gradient Descent(BGD): processes all the training examples for each iteration of gradient descent. 2- Stochastic Gradient Descent(SGD): processes one training example per…

machine-learning deep-learning optimization gradient-descent sgd

asked Jun 01 '19 at 14:17

Hunar

1,197
2
11
33

votes

2 answers

The central idea behind SGD

Pr. Hinton in his popular course on Coursera refers to the following fact: Rprop doesn’t really work when we have very large datasets and need to perform mini-batch weights updates. Why it doesn’t work with mini-batches? Well, people have tried it,…

machine-learning neural-network deep-learning sgd

asked Sep 29 '20 at 18:33

Green Falcon

14,308
10
59
98

votes

2 answers

Explanations about ADAM Optimizer algorithm

I'm a beginner in Machine learning and I'm searching for some optimizer for the gradient descent. I've searched many topics about that, and did a state of art of all these optimizers. I have just one problem, and I can't figure it out. Don't judge…

machine-learning deep-learning neural-network gradient-descent sgd

asked Aug 08 '18 at 10:09

Sabrina Tesla

votes

0 answers

How tolerance check is done in Mini-Batch Gradient Descent?

I'm trying to understand how tolerance check is done in Mini-Batch Gradient Descent. Here are some methods but I'm not sure which one is the most common approach: 1) Begin the epoch Shuffle dataset For each batch in dataset Make predictions…

machine-learning gradient-descent mini-batch-gradient-descent sgd

asked Mar 07 '25 at 15:24

Guest

vote

2 answers

Why does using Gradient descent over Stochatic gradient descent improve performance?

Currently, I'm running two types of logistic regression. logistic regression with SGD logistic regression with GD implemented as follows SGD= SGDClassifier(loss="log",max_iter=1000,penalty='l1',alpha=0.001) logreg =…

machine-learning python logistic-regression gradient-descent sgd

asked May 11 '21 at 22:39

haneulkim

vote

1 answer

input shape of keras Sequential model

i am new to neural networks using keras, i have the following train samples input shape (150528, 1235) and output shape is (154457, 1235) where 1235 is the training examples, how to put the input shape, i tried below but gave me a ValueError: Data…

keras mse sgd

asked Jan 26 '21 at 13:22

ammar

vote

2 answers

Confused between optimizer and loss function

I always thought the SGD was a loss function then I read this on a notebook model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.SGD(lr=1e-3), metrics=["accuracy"]) now I am confused what's the difference between loss…

keras loss-function optimization sgd

asked Nov 18 '20 at 04:10

Hanna polaskus

vote

3 answers

Why Mini batch gradient descent is faster than gradient descent?

As I understand them: Mini Batch Gradient Descent : It takes a specified batch number say 32. Evaluate loss on 32 examples. Update weights. Repeat until every example is complete. Repeat till a specified epoch. Gradient Descent : Evaluate loss…

machine-learning deep-learning gradient-descent mini-batch-gradient-descent sgd

asked Sep 14 '20 at 05:23

Shiv

vote

1 answer

Does using different optimizer change the loss landscape

I plot the landscape using this code, and I notice the landscape shape has changed a lot. My understanding is that the optimizer does not change the loss landscape. But now I'm confused if its just that I made a mistake in generating the plot or…

neural-network keras pytorch loss-function sgd

asked Apr 06 '24 at 20:07

user836026

vote

0 answers

How to calculate gradient of MSE in backpropagation?

I want to implement a neural network from scratch to solve linear regression by using backpropagation. I don't understand how to compute the gradient of the MSE cost function with respect to each weight. The formula I have say that: for each weight…

machine-learning python gradient-descent backpropagation sgd

asked Mar 12 '23 at 21:07

Iya Lee

vote

1 answer

Difference between sklearn's LogisticRegression and SGDClassifier?

What is the difference between sklearn's LogisticRegression classifier and its SGDClassifier? I understand that the SGD is an optimization method, while Logistic Regression (LR) is a machine learning algorithm/model. I also understand that…

python scikit-learn logistic-regression gradient-descent sgd

asked Nov 24 '22 at 14:34

BigBrownBear00

vote

1 answer

Understanding SGD for Binary Cross-Entropy loss

I'm trying to describe mathematically how stochastic gradient descent could be used to minimize the binary cross entropy loss. The typical description of SGD is that I can find online is: $\theta = \theta - \eta…

machine-learning gradient-descent multilabel-classification mathematics sgd

asked May 19 '22 at 14:03

Coinman

vote

1 answer

How variable alpha changes SGDRegressor behavior for outlier?

I am using SGDRegressor with a constant learning rate and default loss function. I am curious to know how changing the alpha parameter in the function from 0.0001 to 100 will change regressor behavior. Below is the sample code I have: from…

python hyperparameter-tuning outlier sgd

asked Apr 08 '22 at 19:34

Yellow_truffle

vote

1 answer

Understanding Learning Rate in depth

I am trying to understand why the learning rate does not work universally. I have two different data sets and have tested out three learning rates 0.001 ,0.01 and 0.1 . For the first data set, I was able to achieve results for all learning rates at…

machine-learning deep-learning optimization gradient-descent sgd

asked Apr 04 '22 at 14:54

noooah

2 Next