I want to use the Gradient Boosting algorithm with exponential loss function and I am struggling to understand how to use the Newton-Raphson step to update the predictions. In python's sklearn GradientBoostingClassifier the update is the following:
numerator = np.sum(y_ * sample_weight * np.exp(-y_ * pred))
denominator = np.sum(sample_weight * np.exp(-y_ * pred))
# prevents overflow and division by zero
if abs(denominator) < 1e-150:
tree.value[leaf, 0, 0] = 0.0
else:
tree.value[leaf, 0, 0] = numerator / denominator
The numerator is the negative sum of the partial first derivatives of the exponential loss function \begin{align} \ L(pred) = sum(e^{-y * pred}) \\ \ numerator = sum(-\frac{d}{dpred} L(pred)) = sum(y*e^{-y * pred}) \end{align}
The denominator is the sum of the partial second derivatives of the exponential loss function \begin{align} \ denominator = sum(\frac{d^2}{dpred^2} L(pred)) = sum(y^2*e^{-y * pred}) = sum(e^{-y * pred}), \ since \ y^2 = 1 \end{align}
But according to Newton-Raphson algorithm the update in pred should be:
\begin{align} \ pred = pred - inverse(Hessian(L(pred)))*Gradient(L(pred)) \\ \end{align}
where Hessian is the diagonal matrix with the partial second derivatives in the main diagonal.
Why python sums over the Gradient and over the Hessian and then take the ratio of the two as the update in predictions???
For the Newton-Raphson Algorithm I follow the following links https://www.stat.washington.edu/adobra/classes/536/Files/week1/newtonfull.pdf