4

I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.

Edit : Activation function : sigmoid. 521 epochs.

Pink
  • 41
  • 1
  • 3

2 Answers2

2

The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
0

The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:

NOTE: The link provided for 'Vanishing Gradient' explains beautifully why increasing layers make your network more susceptible to saturation of learning.

The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.

Solution:

  • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
  • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
  • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
DuttaA
  • 813
  • 8
  • 24