10

I've built an artificial neural network in python using the scipy.optimize.minimize (Conjugate gradient) optimization function.

I've implemented gradient checking, double checked everything etc and I'm pretty certain it's working correctly.

I've run it a few times and it reaches 'Optimization terminated successfully' however when I increase the number of hidden layers, the cost of the hypothesis increases (everything else is kept the same) after it has successfully terminated.

Intuitively it feels as if the cost should decrease when the number of hidden layers is increased, as it is able to generate a more complex hypothesis which can fit the data better, however this appears not to be the case.

I'd be interested to understand what's going on here, or if I've implemented neural net incorrectly?

Emre
  • 10,541
  • 1
  • 31
  • 39
user3726050
  • 109
  • 4

2 Answers2

9

There are so many ways to go wrong with a neural net that it's going to be difficult to debug. Also, to address your intuition, each additional hidden layer makes learning much harder. With that said, here are some possibilities:

  1. You have added weight decay. Adding more layers adds more weights which increases your regularization cost.
  2. The problem is simple enough that a model with a single hidden layer is sufficient. Adding more hidden layers makes it harder for the network to learn (harder optimization problem).
  3. The optimization method is not doing a great job (I prefer climin to scipy.optimize).
  4. You are using the sigmoid/tanh activation function. The sigmoid function causes the vanishing gradient problem which makes learning hard with more layers. Try using the ReLu function.

Training neural nets takes a lot of practice, luck, and patience. Good luck.

user1133029
  • 121
  • 2
3

Increasing the number of hidden layers for a standard neural network actually won't improve results in a majority of cases. Changing the size of the hidden layer will.

This fact (that the number of hidden layers does very little) has actually was noted historically and is the motivation behind the field of deep learning. Deep learning is effectively clever ways of training multilayer neural networks by, for example, isolating subsets of features when training different layers.

Good introductory video on this topic on YouTube

eric chiang
  • 233
  • 2
  • 7