After studying some material (like this) related to neural netowrks, I've seen that they use backpropagation as an efficient way to implement gradient-descent. So far so good.
However, what I fail to understand is the usage of "learning rate". Indeed, choosing the right learning rate is one of the main factors that will make training succeed/fail. But the first time I studied the gradient-descent optimization algorithm, there was sometimes the possibility for optimal-step gradient descent. When that was out of the question, something like Wolfe's rule would come to the rescue.
So, in some way or another, the "learning rate" for gradient descent can be chosen dynamically at each iteration. Why is this not often applied in neural network training?