I've built an artificial neural network in python using the scipy.optimize.minimize (Conjugate gradient) optimization function.
I've implemented gradient checking, double checked everything etc and I'm pretty certain it's working correctly.
I've run it a few times and it reaches 'Optimization terminated successfully' however when I increase the number of hidden layers, the cost of the hypothesis increases (everything else is kept the same) after it has successfully terminated.
Intuitively it feels as if the cost should decrease when the number of hidden layers is increased, as it is able to generate a more complex hypothesis which can fit the data better, however this appears not to be the case.
I'd be interested to understand what's going on here, or if I've implemented neural net incorrectly?