Confusion about Armijo's rule

Question

I am currently working on line search algorithms and read about Armijo's Rule. The rule is given for a fixed $\beta \in (0,1)$ and $\eta >1$ we choose $\alpha \geq 0$ such that

$$ f(x+\alpha\cdot d) \leq f(x) + \beta\cdot \nabla f(x)^t\cdot d\cdot \alpha, \tag{$*$}$$ in order for $\alpha$ not being to big. And to ensure that $\alpha$ is not to small we want $(\eta\alpha)$ to not fulfill $(*)$. I have two questions regarding this algorithm:

In the case that $f \in \mathcal C^1(\mathbb R, \mathbb R)$ do we still consider $f'(x)\cdot d$ or just $f'(x)$ in $(*)$? Because the book states the inequality without the direction variable $d$. I guess it might be because of rescaling $\alpha$ and only having 1-dimensional direction it should not be different.
When implementing backtracking the given algorithm does not check whether or not $\alpha$ is too small i.e. if $(\eta\alpha)$ does not satisfy $(*)$. Why is that? As far as I understood since $d$ is aleardy a descent direction $\alpha$ being too small only affects the rate of convergence but not the algorithm per se. Is that true and, thus, we do not need to check the second property?

Regarding $t$, the one place I see it is in your $ f(x+\alpha\cdot d) \leq f(x) + \beta\cdot \nabla f(x)^{\color{red}{t}}\cdot d\cdot \alpha$. — John Omielan, Jun 28 '25 at 20:15
Usually folks write $a^Tb$ or $\langle a, b \rangle$ or some other physics perversion. — copper.hat, Jun 28 '25 at 20:43
The purpose of the Armijo rule is to (very loosely) tie the gradient to some local 'guaranteed' descent. It is a useful rule from the purposes of convergence proofs (usually accumulation points are stationary) an also a practical rule if you have no alternative. There are generalisations that work with trust region methods. — copper.hat, Jun 28 '25 at 20:47

score 1 · Accepted Answer · answered Jun 30 '25 at 06:54

For your first question, in dimension 1, $d$ is just a scalar like $\alpha$, and so you can subsume $d$ into $\alpha$ (define $\alpha'=d\alpha$) and work with $\alpha'$ instead of $\alpha$. Observe that $\alpha$ appears in the equation only multiplied by $d$. Thus WLOG, we can let $d=\pm 1$ or drop it. You cannot do this in higher dimensions because $\alpha$ is then a scalar while $d$ is a vector. There are many directions in space, while in one dimension there are only two directions (positive and negative) which can be accommodated with a minus sign.

For your second question, the usual way to use Armijo is to iteratively keep halving the step size from an initial large $\alpha$ (in other words $\alpha_{k+1} = \frac{1}{2} \alpha_k$) until the inequality is satisfied. This means that when $\alpha_{k+1}$ is chosen, it is because $\alpha_k=2\alpha_{k+1}$ did not satisfy the inequality and therefore $\eta = 2$.

Confusion about Armijo's rule

1 Answers1