Bisection is a simple and wasteful method. All the better methods use the available data about function values and build a model from them. The root of the model is then taken as the new midpoint. The cost for the better convergence is that the function has to be differentiable to some order and the root simple.
In the bisection method, the next root approximation is computed without referring to the previous knowledge of the function, the next midpoint is just the arithmetic midpoint of the interval. Only in the selection of the next search interval do the signs of the function values enter. This works for any continuous function, but with the slow speed of such an universal procedure. It is very hard to generate a function that is continuous but not (piecewise) differentiable as a computer function, giving bisection a similar place as bubble sort has among the search algorithms, educational but not practical.
The order of convergence, although formulated for the point sequence and its distance to the root, is in most proofs of it about the progression of the function values. At simple roots, these two measures are equivalent (via mean value theorem). The success of such an analysis depends on the point sequence being constructed from the function values. This is not the case for the bisection method, which is the cause of it being somewhat exceptional in that regard. Thus as a similar measure the progression of the lengths of the bounding/bracketing interval is taken instead. This will always leave the feeling of not being true to the principles.
$$\lim_{k\to\infty} \frac{|x^{\ast} - x_k|}{|x^{\ast} - x_{k-1}|}$$ does not exist, although it "almost" equals $\frac{1}{2}$, but it will be drop to small values infinite many times. You can replace the limit with $\limsup_{k\to\infty}\frac{|x^{\ast} - x_k|}{|x^{\ast} - x_{k-1}|}$, which is telling you the sequence should at least converge with some speed.
– Yimin Apr 08 '24 at 02:05