Suppose $f(A)=\operatorname{tr} (AB)$, then
$f(A+H)-F(A) = \operatorname{tr} (HB)$, so we have
$Df(A)(H) = \operatorname{tr} (HB)$.
(Not surprisingly, since trace is linear.)
In a Hilbert space, the gradient of a functional is an element $\nabla f(A)$ such
that $Df(A)(H) = \langle \nabla f(A), H \rangle$ for all $H$.
Since $\langle X, Y \rangle = \operatorname{tr} (X^T Y)$, we see that $\nabla f(A) = B^T$.
This is entirely analogous to a function $g : \mathbb{R}^n \to \mathbb{R}$.
The derivative is usually written as a row vector while the gradient is a
column vector.
Addendum:
Let $f(A) = \operatorname{tr} (A B A^T C)$. Then we have
$f(A+H)-f(A) = \operatorname{tr} (H B A^T C)+\operatorname{tr} (A B H^T C)+\operatorname{tr} (H B H^T C)$. The last term is of order $O(\|H\|^2)$, so we see that
$Df(A)(H) = \operatorname{tr} (H B A^T C)+\operatorname{tr} (A B H^T C) $.
The relevant properties of trace are that (i) transpose invariance $\operatorname{tr} X = \operatorname{tr} X^T$ and (ii) shift invariance $\operatorname{tr} (X_1 ... X_n) = \operatorname{tr} (X_2...X_n X_1)$.
Applying these gives
\begin{eqnarray}
Df(A)(H) &=& \operatorname{tr} ((C^T A B^T)^T H)+\operatorname{tr} ((CAB)^TH) \\
&=& \langle C^T A B^T + CAB, H \rangle
\end{eqnarray}
from which we get the gradient to be $\nabla f(A) = C^T A B^T + CAB$.