Questions tagged [sgd]

Stochastic Gradient Descent (SGD) is an iterative algorithms used for objective function optimization used in machine learning models.

Stochastic Gradient Descent (SGD) is an iterative algorithms used for objective function optimization used in machine learning models. It is often considered a stochastic approximation approach to the Gradient Descent algorithm that is commonly used in machine learning optimization problems.

21 questions
6
votes
1 answer

Changing the batch size during training

The choice of batch size is in some sense the measure of stochasticity : On one hand, smaller batch sizes make the gradient descent more stochastic, the SGD can deviate significantly from the exact GD on the whole data, but allows for more…
4
votes
2 answers

How is Stochastic Gradient Descent(SGD) used like Mini Batch Gradient Descent(MBGD)?

As I know, Gradient Descent(GD) has three variants which are: 1- Batch Gradient Descent(BGD): processes all the training examples for each iteration of gradient descent. 2- Stochastic Gradient Descent(SGD): processes one training example per…
Hunar
  • 1,197
  • 2
  • 11
  • 33
3
votes
2 answers

The central idea behind SGD

Pr. Hinton in his popular course on Coursera refers to the following fact: Rprop doesn’t really work when we have very large datasets and need to perform mini-batch weights updates. Why it doesn’t work with mini-batches? Well, people have tried it,…
Green Falcon
  • 14,308
  • 10
  • 59
  • 98
3
votes
2 answers

Explanations about ADAM Optimizer algorithm

I'm a beginner in Machine learning and I'm searching for some optimizer for the gradient descent. I've searched many topics about that, and did a state of art of all these optimizers. I have just one problem, and I can't figure it out. Don't judge…
2
votes
0 answers

How tolerance check is done in Mini-Batch Gradient Descent?

I'm trying to understand how tolerance check is done in Mini-Batch Gradient Descent. Here are some methods but I'm not sure which one is the most common approach: 1) Begin the epoch Shuffle dataset For each batch in dataset Make predictions…
1
vote
2 answers

Why does using Gradient descent over Stochatic gradient descent improve performance?

Currently, I'm running two types of logistic regression. logistic regression with SGD logistic regression with GD implemented as follows SGD= SGDClassifier(loss="log",max_iter=1000,penalty='l1',alpha=0.001) logreg =…
1
vote
1 answer

input shape of keras Sequential model

i am new to neural networks using keras, i have the following train samples input shape (150528, 1235) and output shape is (154457, 1235) where 1235 is the training examples, how to put the input shape, i tried below but gave me a ValueError: Data…
ammar
  • 23
  • 4
1
vote
2 answers

Confused between optimizer and loss function

I always thought the SGD was a loss function then I read this on a notebook model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.SGD(lr=1e-3), metrics=["accuracy"]) now I am confused what's the difference between loss…
Hanna polaskus
  • 91
  • 1
  • 2
  • 5
1
vote
3 answers

Why Mini batch gradient descent is faster than gradient descent?

As I understand them: Mini Batch Gradient Descent : It takes a specified batch number say 32. Evaluate loss on 32 examples. Update weights. Repeat until every example is complete. Repeat till a specified epoch. Gradient Descent : Evaluate loss…
1
vote
1 answer

Does using different optimizer change the loss landscape

I plot the landscape using this code, and I notice the landscape shape has changed a lot. My understanding is that the optimizer does not change the loss landscape. But now I'm confused if its just that I made a mistake in generating the plot or…
user836026
  • 127
  • 1
  • 5
1
vote
0 answers

How to calculate gradient of MSE in backpropagation?

I want to implement a neural network from scratch to solve linear regression by using backpropagation. I don't understand how to compute the gradient of the MSE cost function with respect to each weight. The formula I have say that: for each weight…
1
vote
1 answer

Difference between sklearn's LogisticRegression and SGDClassifier?

What is the difference between sklearn's LogisticRegression classifier and its SGDClassifier? I understand that the SGD is an optimization method, while Logistic Regression (LR) is a machine learning algorithm/model. I also understand that…
1
vote
1 answer

Understanding SGD for Binary Cross-Entropy loss

I'm trying to describe mathematically how stochastic gradient descent could be used to minimize the binary cross entropy loss. The typical description of SGD is that I can find online is: $\theta = \theta - \eta…
1
vote
1 answer

How variable alpha changes SGDRegressor behavior for outlier?

I am using SGDRegressor with a constant learning rate and default loss function. I am curious to know how changing the alpha parameter in the function from 0.0001 to 100 will change regressor behavior. Below is the sample code I have: from…
1
vote
1 answer

Understanding Learning Rate in depth

I am trying to understand why the learning rate does not work universally. I have two different data sets and have tested out three learning rates 0.001 ,0.01 and 0.1 . For the first data set, I was able to achieve results for all learning rates at…
1
2