$\DeclareMathOperator{\tr}{tr}$ Let $A,B$ be self-adjoint matrices and $f$ be a real differentiable function on $\mathbb{R}$ with derivative $f'$. Then why is it true that $$ \left.\ \frac{d}{dt}\right|_0 \tr f(A+tB)=\tr (f'(A)B) $$
This is used in the Klein's inequality. However, I'm not sure why exactly this is true in general. It's pretty clear why it's true for polnomials since we can use the commutation relation of the trace function, but it's harder to justify in general. I also checked the linked reference (E. Carlen, Trace Inequalities and Quantum Entropy: An Introductory Course, Contemp. Math. 529 (2010) 73–140) with no luck, as the author didn't give much explanation.
EDIT: After some further thought, let me provide an incomplete proof of what I got so far. Hopefully someone with better knowledge can finish the proof.
For simplicity, let $\lambda_i(A)$ denote the eigenvalues of $A$ in descending order, i.e., $\lambda_1(A) \ge \cdots \ge \lambda_d (A)$. Then $$ \tr \left( \frac{f(A+tB)-f(A)}{t}\right) = \sum_i \frac{1}{t}[f(\lambda_i(A+tB)-f(\lambda_i(A))] $$ Notice that by Weyl's inequality (stability of eigenvalues), we see that $|\lambda_i(A+tB)-\lambda_i(A)|\le t||B||$. Hence, using an $\epsilon,\delta$ arguement, we can replace the above with $$ \sum_i \frac{1}{t}(\lambda_i(A+tB)-\lambda_i(A)) f'(\lambda_i(A)) $$ Now first assume that $A$ has a simple spectrum, then $A+tB$ is also simple for sufficiently small $t$. Then by Hadarmard's variation formula, we see that $$ \frac{1}{t}(\lambda_i(A+tB)-\lambda_i(A)) \to \langle i|B| i\rangle $$ where $|i\rangle$ is the corresponding eigenvector (unique up to phase since we are assuming that $A$ is simple) to $\lambda_i(A)$. Plugging all this back in, we see that the formula at least holds when $A$ is simple.
EDIT 2. I think I now have a way of dealing with degenerate eigenvalues. I will provide a sketch and fill in the details later (if someone else doesn't point out an error).
Let $\lambda_1 (A)=\cdots =\lambda_r(A)$ be the degenerate eigenvalues. Then for sufficiently small $t$, the eigenvalues $\lambda_i (A+tB),i=1,...,r$ will not touch the other eigenvalues (Weyl's inequality again). Let us use the Riesz projector $$ P_A =\frac{1}{2\pi i} \oint_\Gamma \frac{dz}{A-z} $$ where $\Gamma$ is some "smooth" contour around the $\lambda_1 (A)=\cdots =\lambda_r(A)$ and its interior does not contain any other eigenvalues. By Weyl's inequality, we can assume that $\lambda_i(A+tB),i=1,...,r$ are still in the interior of $\Gamma$ for sufficiently small $t$. Notice that $$ \frac{d}{dt} \Big|_0 \tr {((A+tB)P_{A+tB})} = \tr(BP_A) $$ where I got some inspiration from @Ruy's comment and used the fact that \begin{align} \frac{d}{dt}\Big|_0 \tr{(A(P_{A+tB}-P_A))}&=\tr A\oint_\Gamma \frac{dz}{(z-A)^2}B \\ &= \sum_{i=1}^r \oint_\Gamma \lambda_i(A)\frac{1}{(z-\lambda_i(A))^2} dz \langle i|B|i\rangle \\ &=0 \end{align} Hence, if we combine this with the previous part, we see that the equality holds.
My proof is a little convoluted, so I would still hope to see a more straightforward approach