Understanding the proof of Schwartz Kernel Theorem

Question

I have seen several proof of Schwartz Kernel Theorem, using different techniques. Some (such as Melrose's proof in his notes on microlocal analysis) use the representations of $\mathcal{S}(\mathbb{R}^n)$ and $\mathcal{S}'(\mathbb{R}^n)$ in terms of weighted Sobolev spaces, others (such as the proof in Duistermaat and Kolk) use the Fourier transform, others (such as the one in Friedlander and Joshi) use Fourier series.

I can follow these proofs, but I feel I don't really understand them, in that I don't understand what fundamental properties of the space of distributions make them work.

I see that there are similarities: for example, the last two approaches use some sort of representation of test functions on $X\times Y$ into sums of tensor products of test functions on $X$ and $Y$.

I found this remark in an old paper of Ehrenpreis (On the Theory of Kernels of Schwartz, Proceedings of the American Mathematical Society, Vol. 7, No. 4 (Aug., 1956), pp. 713-718):

Lemma 1 is the only part of the proof of Theorem 1 [the kernel theorem] that uses special properties of the space $\mathcal{D}$ and, in fact, the analog of Theorem 1 [the kernel theorem] holds for (essentially) all function spaces for which an analog of Lemma 1 can be found.

Lemma 1 is the following

Let $B$ be a bounded set in $\mathcal{D}(\mathbb{R}^n\times\mathbb{R}^n)$. Then we can find a bounded set $B'\subset\mathcal{D}(\mathbb{R}^n)$ and a $b>0$ so that every $f\in B$ can be written in the form $\sum_i \lambda_ig_i\otimes h_i$ where $\sum_i|\lambda_i|<b$, and $g_i, h_i\in B'$, and where the series converges in $\mathcal{D}(\mathbb{R}^n\times\mathbb{R}^n)$.

The remark would suggest that the key point really is being able to decompose test functions on $X\times Y$ into sums of tensor products of test functions on $X$ and $Y$, but I still don't see why this should be the case.

I also read that the theory of Nuclear Spaces proves an abstract kernel theorem, generalising the usual statement for distributions. I assume this implies being able to extract the fundamental properties that make the kernel theorem work, but I found no short and essential exposition of the theory, or one which does not require extensive prerequisites.

So, my questions are:

How do people who understand the kernel theorem think about its proof?
What are the fundamental ingredients that make it work?
I understand why it is important, but why is it so surprising that every continuous linear map $\mathcal{D}\rightarrow\mathcal{D}'$ is given by a kernel?

You said you understand the Fourier series proof... Which gives immediately the decomposition of a test function $\psi\in C^\infty_c([-T,T]^n)$ as $\psi(x,y)=\sum_{m,k} c_{m,k} \phi(x)e^{2i\pi <m,x>/4T} \phi(y)e^{2i\pi <n,y>/4T}$ where $\phi \in C^\infty_c([-2T,2T]^n]$ is $1$ on $[-T,T]^n$. Those things as well as Schwartz Kernel Theorem aren't surprising at all, they follow from the standard tool of restricting to a dense subspace where everything is easier. — reuns, Jan 17 '20 at 14:08

Abdelmalek Abdesselam · Answer 1 · 2020-01-31T23:14:02.430

Depends what you call the Kernel Theorem. The full version is that the map $$ \mathcal{D}'(\mathbb{R}^{m+n}) \rightarrow {\rm Hom}(\mathcal{D}(\mathbb{R}^m),\mathcal{D}'(\mathbb{R}^n)) $$ $$ T\mapsto(f\mapsto (g\mapsto T(f\otimes g)) ) $$ is a topological vector space isomorphism. Here $f(x)$ is a test function in $\mathcal{D}(\mathbb{R}^m)$, $g(y)$ is a test function in $\mathcal{D}(\mathbb{R}^n)$ and $f\otimes g$ denotes the test function in $\mathcal{D}(\mathbb{R}^{m+n})$ given by $(x,y)\mapsto f(x)g(y)$. The spaces of distributions $\mathcal{D}'(\mathbb{R}^{m+n})$ and $\mathcal{D}'(\mathbb{R}^n)$ must be given the proper topology, ie., the strong topology and not the weak-star. The space ${\rm Hom}(\mathcal{D}(\mathbb{R}^m),\mathcal{D}'(\mathbb{R}^n))$ is the space of continuous (in the usual point set topology sense, not that of sequential continuity) linear maps from $\mathcal{D}(\mathbb{R}^m)$ to $\mathcal{D}'(\mathbb{R}^n)$. The topology on this $\rm Hom$ is the one defined by the seminorms $$ ||\varphi||=\sup_{f\in A}\rho(\phi(f)) $$ where $A$ ranges over bounded sets in $\mathcal{D}(\mathbb{R}^m)$ and $\rho$ over continuous seminorms of $\mathcal{D}'(\mathbb{R}^n)$. Equivalently, you can take the seminorms $$ ||\varphi||=\sup_{f\in A, g\in B}|\phi(f)(g)| $$ where $A$ ranges over bounded sets in $\mathcal{D}(\mathbb{R}^m)$ and $B$ ranges over bounded sets in $\mathcal{D}(\mathbb{R}^n)$.

To truly understand the theorem, you need to first consider the simpler case with $\mathcal{S},\mathcal{S}'$ instead of $\mathcal{D},\mathcal{D}'$. This in turn requires the understanding of the discrete toy model given by spaces of sequences.

Let $\mathbb{N}=\{0,1,2,\ldots\}$. We denote by $s(\mathbb{N}^m)$ the space of (multi)sequences $u=(u_{\alpha})$ indexed by multiindices $\alpha\in\mathbb{N}^m$ for which the following quantities are finite $$ ||u||_k=\sup_\alpha \langle\alpha\rangle^k|u_{\alpha}| $$ for all $k\in\mathbb{N}$. Here I used the Japanese bracket $\langle\alpha\rangle=\sqrt{1+\alpha_1^2+\cdots+\alpha_m^2}$. We use the above seminorms to define the topology of this space of rapidly decaying multisequences.

Then we define the space $s'(\mathbb{N}^m)$ of multisequences of moderate growth, i.e., multisequences $v=(v_{\alpha})_{\alpha\in\mathbb{N}^m}$ for which there exists $k\in\mathbb{N}$ and $C\ge 0$ such that for all $\alpha$ $$ |v_{\alpha}|\le C\langle\alpha\rangle^k\ . $$ It can be identified with the topological dual of $s(\mathbb{N}^m)$ via the obvious pairing $$ (v,u)\mapsto \sum_{\alpha\in\mathbb{N}^m}v_{\alpha} u_{\alpha}\ . $$ The correct (strong) topology on this topological dual becomes, at the level of its concrete representation $s'(\mathbb{N}^m)$, the topology generated by the seminorms $$ ||v||_u=\sup_{\alpha\in\mathbb{N}^m} u_{\alpha} |v_{\alpha}| $$ indexed by elements $u$ of $s(\mathbb{N}^m)$ with non negative entries.

One can now state the toy kernel theorem in exactly the same way as before. Namely, the map $$ \mathcal{s}'(\mathbb{N}^{m+n}) \rightarrow {\rm Hom}(\mathcal{s}(\mathbb{N}^m),\mathcal{s}'(\mathbb{N}^n)) $$ $$ v\mapsto(u\mapsto (\sum_{\alpha\in\mathbb{N}^m} v_{\alpha,\beta}u_\alpha)_{\beta\in\mathbb{N}^n} ) $$ is a topological vector space isomorphism. The proof is a bit long but elementary. If you work it out by yourself, you will have understood the kernel theorem. Indeed, using Hermite functions and the resulting isomorphisms with multisequence spaces, the above toy model kernel theorem implies the one for $\mathcal{S},\mathcal{S}'$.

The key facts needed for the toy theorem are:

If $(v_{\alpha,\beta})$ is in $s_+'(\mathbb{N}^{m+n})$ ("plus" means only multisequences with nonnegative entries), then there exists $c\in s_+'(\mathbb{N}^{m})$ and $d\in s_+'(\mathbb{N}^{n})$ such that $v_{\alpha,\beta}\le c_{\alpha}d_{\beta}$ for multiindices $\alpha,\beta$. (this is trivial)
A multisequence $(v_{\alpha})$ belongs to $s'(\mathbb{N}^{m})$, i.e., is of temperate growth, if and only if $$ \forall u\in s_{+}(\mathbb{N}^{m}), \sup_{\alpha\in\mathbb{N}^m}u_\alpha |v_\alpha|<\infty\ . $$

If $\mathcal{S},\mathcal{S}'$ is not enough for you and you insist on $\mathcal{D},\mathcal{D}'$. You can also do it with multimatrices (instead of multisequences), but that's quite a bit more work since you will need the results of this article by Bargetz.

score 4 · Answer 2 · 2020-02-10T17:50:43.600

I like to approach the Schwartz Kernel Theorem via Hilbert-Schmidt (HS) operators. A bounded linear operator $A:L^2(Y)\to L^2(X)$ is HS iff it is an integral operator with kernel $K_A\in L^2(X\times Y)$. The definition of HS operators between general Hilbert spaces refers to orthonormal bases. The composition of a HS operator with a bounded linear operator is again a HS operator.

Now let $A$ be a continuous linear operator, $A:\mathcal{S}(Y)\to\mathcal{S}'(X)$, from Schwartz space into temperate distributions. Here $X$ and $Y$ are euclidean spaces. The Kernel Theorem states that there exists $K_A\in\mathcal{S}'(X\times Y)$ such that $\langle Au,v\rangle =\langle K_A,v\otimes u\rangle$ holds for $u\in\mathcal{S}(Y)$ and $v\in\mathcal{S}(X)$. A proof follows from the following claim: There exists a HS operator $H:L^2(Y)\to L^2(X)$ and linear differential operators $L$ and $R$ with polynomial coefficients such that $A=LHR$. Then, using function notation, the kernel of $A$ is a derivative of the kernel $K_H$ of $H$: $$K_A(x,y)=L(x,D_x)R^t(y,D_y)K_H(x,y).$$ Here $R^t$ is the transpose of $R$. More precisely, using duality brackets and Schwartz functions $u$ and $v$, this proof of the Kernel Theorem reads: $$ \langle Au,v\rangle = \langle HRu,L^t v\rangle = \langle K_H, L^tv\otimes Ru\rangle = \langle K_H, L^t R(v\otimes u)\rangle =\langle K_A,v\otimes u\rangle. $$ In the second to last equality, $L$ and $R$ are regarded as differential operators over $X\times Y$ in the obvious way. It remains to prove the claim. Differential operators of the form $\langle x\rangle^k \langle D_x\rangle^n$ are isomorphisms of Schwartz space. Furthermore, the seminorms $u\mapsto \|Lu\|_{L^2}$, where $L$ runs through a countable set of differential operators which are isomorphisms, define the topology of Schwartz space; see the first chapter of the notes of Melrose on microlocal analysis. The bilinear form $(u,v)\mapsto \langle Au,v\rangle$ is separately continuous by hypothesis, hence continuous by a corollary to the Banach-Steinhaus Theorem. Therefore there exist invertible differential operators $L_1$ and $R_1$ such that $$ |\langle Au,v\rangle|\leq \|R_1u\|_{L^2}\|L_1^tv\|_{L^2}$$ holds for all $u,v$. It follows that $B=L_1^{-1}AR_1^{-1}$ is a bounded operator $L^2(Y)\to L^2(X)$. Choose an invertible differential operator $L_2$ with $L_2^{-1}$ HS on $L^2(X)$. Set $H=L_2^{-1}B$. Then $H$ is HS, and the claim is proven. (I haven't found exactly this proof of the kernel theorem in the literature, but I assume that it is known to experts.)

The proof of the kernel theorem for operators on sections of vector bundles over manifolds can be reduced to the special case treated above by locally trivializing. One has to be careful giving an invariant statement, however. The Schwartz kernel is a distribution section of an exterior tensor product bundle over $X\times Y$.

In his 1953 thesis, Grothendieck proved a general kernel theorem for linear continuous operators $A:E\to F$ between nuclear locally convex spaces. He studied topologies on tensors products $F\otimes E$ and their completions. For the proof of the kernel theorem two topologies are relevant: The projective topology, $F\otimes_{\pi}E$, and the $\varepsilon$ topology, $F\otimes_{\varepsilon}E$. For general locally convex spaces $E$ and $F$ these topologies differ, but if $E$ or $F$ is nuclear, then these topologies are the same. The significance of the projective topology is that the dual space of $F\otimes_{\pi} E$ is the space of continuous bilinear forms $(v,u)\mapsto\langle Au,v\rangle$. The $\varepsilon$ topology on the other hand is designed to provide a subspace topology. In the case of Schwartz space, $$\mathcal{S}(X)\tilde\otimes_{\varepsilon}\mathcal{S}(Y)=\mathcal{S}(X\times Y)$$ where the tilde denotes completion. This proves the kernel theorem the Grothendieck way. The proof of nuclearity of Schwartz space usually employs its representation as a projective limit with Hilbert-Schmidt connecting maps.

How do you know that $L_2^{-1}B$ is Hilbert-Schmidt? The definition of $L_2$ seems too arbitrary and unrelated to B. — Overflowian, Feb 06 '25 at 08:40

Understanding the proof of Schwartz Kernel Theorem

2 Answers2

Linked