There are many ways to prove compactness, e.g., via density of the finite-rank operators or the singular value decomposition. A more basic one—which does not resort to the definition of compactness—is via the following characterization of compactness taken from "Introduction to Functional Analysis" by Meise & Vogt (1997):
Lemma 16.17. The following are equivalent for every self-adjoint operator $T \in B(H)$:
- $T$ is compact.
- For every countable orthonormal system $(e_j)_{j \in \mathbb{N}}$ in $H$ we have $ \lim_{j \to \infty} \langle e_j,T e_j\rangle = 0$.
(Be aware that the cited book does not assume separability of $H$)
Anyway, with this lemma in mind you can argue as follows: Starting from a Hilbert-Schmidt operator $T$ decompose it as $T=T_1+iT_2$ where $T_1=\frac12(T+T^*)$, $T_2=\frac1{2i}(T-T^*)$ are self-adjoint and, by linearity, Hilbert-Schmidt. Thus it suffices to prove that $T_1,T_2$ are both compact, because then the same is true for any linear combination. Now let $k=1,2$ and let $(e_j)_{j \in \mathbb{N}}$ be an arbitrary orthonormal system in $H$. Completing it to an orthonormal basis $(e_j)_{j\in J}$ of $H$ we find
$$
\sum_{j\in\mathbb N}\|T_ke_j\|^2\leq\sum_{j\in J}\|T_ke_j\|^2={\rm tr}(T_k^*T_k)<\infty
$$
meaning that, necessarily, $\lim_{j\to\infty}\|T_ke_j\|\to 0$. Cauchy-Schwarz now implies
$$
\langle e_j,T_ke_j\rangle\leq|\langle e_j,T_ke_j\rangle|\leq\|e_j\|\|T_ke_j\|=\|T_ke_j\|\to 0\qquad\text{as }j\to\infty\,.
$$
But $(e_j)_{j\in\mathbb N}$ was arbitrary so the above Lemma implies that $T_1,T_2$ are compact, hence so it $T$. $\square$