10

In Stein's Harmonic Analysis: Real-Variable Methods, Orthogonality and Oscillatory Integrals, he defines tempered distribution ($\mathscr S'$) as continuous linear functionals from the Schwartz class. Here, the continuity is given with respect to the family of seminorms \begin{equation*}\|\Phi\|_{\alpha,\beta}=\sup_{x\in\mathbb R^N}|x^\alpha\partial^\beta_x\Phi(x)|.\end{equation*} Then, without any further clarification, he proceeds to discuss convolutions between a tempered distribution and functions in the Schwartz family. This is where I get puzzled. If the tempered distributions are functionals, what is this convolution supposed to mean? It seems as if he (and every other source, for that matter) was assuming these functionals can be clearly identified with some functions. My question is: how do you make this identification?

I was thinking, as Schwartz functions are in $L^2$, maybe the identification is the one given by Riesz representation theorem. However, I think this is not possible as the topology we are considering in the Schwartz class is different from that of $L^2$. Moreover, while discussing $H^p$ spaces, he claims that, for $p>1$, $L^p$ is the same as $H^p$. Here, he is using again this identification that I don't quite get and, if my hypothesis was correct that the identification is made through Riesz representation theorem, this should mean $H^2=L^2=\mathscr S'$. This seems a bit strange to me. Another thing that's worrying me as well is the fact that he is discussing, without a prior definition, bounded distributions. Of course, if these were elements of $L^2$'s dual, they would be automatically bounded, so this is another hint that my original assumption about the identification is wrong.

I think this is a very basic question, but I don't find any source in which this is specifically discussed and clarify. How can we talk about an object as both a tempered distribution (an element of $\mathscr S'$) and a function defined on $\mathbb R^N$?

  • 1
    There is a discussion of the convolution of a temperate distribution with a Schwartz function, and the result being a smooth temperate function in my previous answer https://mathoverflow.net/questions/72450/can-distribution-theory-be-developed-riemann-free/351028#351028 – Abdelmalek Abdesselam Dec 29 '23 at 14:39

4 Answers4

10

We don’t need to identify distributions with functions to meaningfully discuss convolution with distributions. Most distributions can’t be identified with functions. Any operation normally defined between functions we define for distributions using duality, by determining how it acts on Schwartz functions. Let $\langle g,\phi\rangle$ be the evaluation pairing between a distribution $g$ and a Schwartz function $\phi$, i.e. $g: \phi \mapsto \langle g,\phi\rangle$. Notice that if $g$ is a locally integrable function, this pairing is just the $L^2$ inner product.

The strategy for determining the proper dual formulation for a statement about distributions is to assume all objects involved are nice functions, then move things around until we have an evaluation pairing between a distribution and a Schwartz function. For example, to make sense of the the action of the distribution $g*\psi$ on a Schwartz function $\phi$, where $g$ is a distribution and $\psi$ is Schwartz, if $g$ were nice we could use Fubini to get

$$\begin{align} \langle g*\psi, \phi \rangle &= \int \phi(x) \int g(y)\psi(x-y) dydx \\ &= \int g(y) \int \phi(x)\psi(x-y)dx dy \\ &= \langle g, \phi*R\psi\rangle \end{align}$$

Where $R$ is the reflection operator $R:\psi(x)\mapsto \psi(-x)$. With this in mind, we define by duality $\langle g*\psi, \phi \rangle = \langle g, \phi*R\psi\rangle$. This determines the behavior of the distribution $g*\psi$ uniquely, since we know how $g$ acts on Schwartz functions. Of course, there’s some details here you can check, like the fact that $g*\psi$ is indeed a tempered distribution, that convolution of Schwartz functions is Schwartz, etc.

Another example: to define derivatives of a distribution, we determine its action on smooth functions by pretending the distribution is smooth and integrating by parts until we have something meaningful:

$$ \langle \partial^\alpha g,\phi\rangle = (-1)^{|\alpha|} \langle g, \partial^\alpha \phi\rangle$$

This equality ultimately defines $\partial^\alpha g$.

Once you’ve established these duality-based definitions, you can informally treat distributions as if they’re functions, even though to be precise you need to work with their actions on Schwartz functions. This is what’s going on behind the scenes in Stein.

  • 3
    This is very much the answer I was working on (and now that it is posted, I've kind of wasted the last half hour of my life). The important thing is that the convolution is defined via the duality pairing---it is appropriate to not think of distributions as functions, and to work more abstractly through that pairing. For what it's worth, I think that Folland actually does a semi-reasonable job of going over this, though a student reading Stein might find Folland a bit of a stretch. (+1) – Xander Henderson Dec 26 '23 at 15:34
  • I think I kinda see it now, thank you. However, there still seems to be some sort of identification that I don't really grasp. I see how you may define the convolution, but when he writes $L^p=H^p$, what does he mean? Is it because these distributions are in $H^p$ and have good propoerties that we can identify them with functions? And how is such an identification made? – confusedTurtle Dec 27 '23 at 10:07
  • What is $H^p$ in this context? The $L^2$ based Sobolev space? Hardy spaces? What page of Stein is this on? – kieransquared Dec 27 '23 at 12:18
  • If what you’re describing is about Hardy spaces, Stein gives a proof of $L^p = H^p$ for $p>1$ right after the statement Theorem 1 in Chapter 3. – kieransquared Dec 27 '23 at 12:32
  • @kieransquared this is precissely the bit of the book I'm talking about, and what I am not understanding. He takes a function $f\in L^p$ and proves $M_\Phi f\in L^p$, which by definition means $f\in H^p$. However, he is just treating $f$ as a function, I don't see what $f$ is supposed to be as a distribution, i.e., what does it do to Schwartz functions. Given a Schwartz function $\Phi$, is f(\Phi) the $L^2$ inner product of $f$ and $\Phi$? And in such case, why is it well defined? Because $\Phi$ is Schwartz? I'm sorry, I have so many questions... – confusedTurtle Dec 27 '23 at 14:02
  • 1
    Any $L^p$ function (or any locally integrable function with suitable growth conditions) induces a tempered distribution by acting on Schwartz functions through the $L^2$ inner product: $f(\phi) = \int f\phi$. This is in fact the motivation for using the notation above. As for why $f\in L^p$ induces a well-defined tempered distribution, you should do that as an exercise (Hint: use Holder’s inequality, then determine how $L^p$ norms interact with Schwarz seminorms). – kieransquared Dec 27 '23 at 16:13
  • By the way, if this is your first exposure to tempered distributions, you probably should first read something easier than Stein, since it’s more of a reference text. Stein and Shakarchi have a book (Book IV in their princeton lectures in analysis series) that covers distributions and other analytic background that might be helpful. – kieransquared Dec 27 '23 at 16:15
5

A distribution is defined as a linear functional over a space of test functions but can (should?) be thought of almost as functions on $\mathbb R^N.$ That's because distributions have local behavior and identity.

If $u$ is a distribution and $\varphi$ a test function then a common way to write the action of $u$ on $\varphi$ is $\langle u, \varphi \rangle.$ This is because there are similarities with inner products. For example, the action of an ordinary function $f$ on a test function $\varphi$ is $$ \langle f, \varphi \rangle = \int_{\mathbb R^N} f(x) \, \varphi(x) \, d^Nx. $$

A distribution can be translated. The definition is defined based on how it works for ordinary functions: $$ \langle \tau_a f, \varphi \rangle = \int (\tau_a f)(x) \, \varphi(x) \, d^Nx = \int f(x-a) \, \varphi(x) \, d^Nx = \int f(x) \, \varphi(x+a) \, d^Nx = \langle f, \tau_{-a}\varphi \rangle . $$ We therefore define the translation $\tau_a u$ by $\langle \tau_a u, \varphi \rangle = \langle u, \tau_{-a}\varphi \rangle$ for all test functions $\varphi.$

I'm not sure that this answers your questions, but hopefully it helps a little bit in understanding distributions.

--

EDIT: Convolution of two distributions

If $f$ and $g$ are ordinary functions for which the convolution $f*g$ is defined then $$ \langle f*g, \varphi \rangle = \int (f*g)(x) \, \varphi(x) \, dx = \int \left( \int f(x-y) \, g(y) \, dy \right) \varphi(x) \, dx \\ = \iint f(x-y) \, g(y) \, \varphi(x) \, dy \, dx = \iint f(z) \, g(y) \, \varphi(z+y) \, dy \, dz \\ = \langle (f\otimes g)(z,y), \varphi(z+y) \rangle = \langle f\otimes g, \varphi\circ+ \rangle. $$

Therefore the convolution of two distributions is defined by $$ \langle u*v, \varphi \rangle = \langle u\otimes v, \varphi\circ+ \rangle, $$ where the tensor product of two distributions is defined by $$ \langle (u\times v)(x,y), \varphi(x,y) \rangle = \langle u(x), \langle v(y), \varphi(x,y) \rangle\rangle. $$ Here the variables $x$ and $y$ are formal, especially after $u$ and $v$, used to make it clear how things are paired.

md2perpe
  • 30,042
  • 2
    This answer is, overall, fine, but I disagree with the introductory paragraph. There are distributions which are "ordinary" functions on $\mathbb{R}^n$, but there are also wilder distributions (e.g. the delta distribution, which causes no end of confusion when students try to treat it like an ordinary function). Pedagogically, I think that it is good to treat distributions as a different class of objects and to do everything possible to distinguish them from ordinary functions. I think that this answer would be a lot stronger if the first paragraph where dropped. – Xander Henderson Dec 26 '23 at 15:29
  • 1
    @XanderHenderson. That's why I wrote "thought of almost as functions". That's how I think of them myself. It's important to know that they are not ordinary functions so you must be careful with how you handle them. But I have also seen confusions when people just learn that they are functionals and don't understand that they have a local behavior similar to functions (and can often be identified with functions at most locations). – md2perpe Dec 26 '23 at 17:21
  • @XanderHenderson Physicists and engineers use distributions pervasively and most often do not invoke rigor. In fact, Paul Dirac, who was awarded a Nobel prize in physics, introduced the Dirac Delta in his 1927 paper The Physical Interpretation of the Quantum Dynamics and used in his textbook The Principles of Quantum Mechanics. It was only later that (1945) that Schwartz formalized the Dirac Delta. I DO like applying rigor. But, if I want a quick result, I rely on heuristics. And in my experience, application of heuristics to distributions has never returned an incorrect result. – Mark Viola Dec 27 '23 at 17:36
  • @md2perpe Happy Holidays my friend. And a (+1) for the nicely written and concise answer – Mark Viola Dec 27 '23 at 17:38
  • @MarkViola I am very aware of Dirac and Schwartz. I am also aware of the ways in which physicists abuse distributions. I have also seen how much confusion distributions cause for students. It is one thing for a Nobel Prize winning intellect to be imprecise in exposition which is meant to communicate with other experts. It is another thing to be imprecise in answer to a question which fundamentally seems to be understanding part of a definition which is dependent upon understanding things a little more precisely. – Xander Henderson Dec 27 '23 at 17:54
  • I am also somewhat puzzled by the amount of text being spent to refute me---I started by asserting that I think that this answer is fine, but would be a lot stronger if the first paragraph were omitted. I have no problem with the answer, I just think it could be better. – Xander Henderson Dec 27 '23 at 17:55
  • @XanderHenderson I agree with you; teaching distributions in a rigorous sense is a good thing. It is not a necessary thing for practioners in physics and engineering. And for those well-versed, applying rigor can be quite inefficient. Happy Holidays. – Mark Viola Dec 27 '23 at 18:38
  • @MarkViola. Here in Sweden, the country of the Nobel Prize, we now say "God fortsättning på julen!". ('Jul' is the Nordic word for Christmas; you have probably heard the cognate 'Yule' in English.) – md2perpe Dec 28 '23 at 07:16
5

The idea would be to define the tempered distribution $T_{\phi}$ for some Schwartz function $\phi$ analogously to what is done for test functions by $$T_{\phi}(\psi)=\int \phi\psi \,d\lambda.$$ This implies that certain tempered distributions can be represented in integral form by Schwartz functions but in contrast to the Hilbert space $L^2$ not all elements in the dual of the Schwartz functions have such a representation (the norm plays an important role here since we are looking at the Banach space $\mathcal{S}$.)

Since test functions are dense in the Schwarz functions, most arguments which you can make on test functions translate directly to Schwartz functions as well. But it turns put that the dual of test functions is not the "correct" way of looking at distributions when it comes to generalizing the Fourier transform.

The convolution of a distribution $T$ with $f$ in the underlying Banach space is then defined as $$(T\ast f)(x):=T(f(x-\cdot)).$$ From the notion above you can in this way convolute tempered distributions and Schwartz functions.

Jfischer
  • 1,073
  • 9
  • 16
0

The other answers have already answered the questions about convolution, but I'd like to say something on the titular question of "How can tempered distributions be identified with functions?" Throughout this answer, $\eta$ will denote an arbitrary "test" Schwartz function on $\mathbb R^d$.

For any bounded continuous $u \in C_b(\mathbb R^d; \mathbb C)$, we define $$T_u (\eta) := \int u(x) \eta(x) \text{ Leb}_{\mathbb R^d}(dx),$$ which is a tempered distribution $T_u \in \mathcal S(\mathbb R^d)^*$.

The Schwartz representation theorem (see https://math.mit.edu/~rbm/iml/Chapter1.pdf pg. 17) states that:

For any tempered distribution $T \in \mathcal S(\mathbb R^d)^*$,

there exists a finite collection collection $u_{\alpha\beta}\in C_b(\mathbb R^d; \mathbb C)$ (indexed by a finite collection of multiindices $\alpha,\beta$, say "finiteness" quantified as $|\alpha|+|\beta|\leq k$ for some natural number $k$)

s.t. $T$ be written/represented as $$T = \sum_{|\alpha|+|\beta|} x^\beta D^\alpha_x T_{u_{\alpha\beta}},$$ where on the RHS, we have the operators $x^\beta, D_x^\alpha: \mathcal S(\mathbb R^d)^* \to \mathcal S(\mathbb R^d)^*$ (defined in the obvious way; see above link pg. 17) applied to the tempered distributions $T_{u_{\alpha\beta}} \in \mathcal S(\mathbb R^d)^*$.

Completely concretely, we have that $$T(\eta) = \int x^\beta u_{\alpha\beta}(x) [-D_x^\alpha\eta](x) \text{ Leb}_{\mathbb R^d}(dx).$$

(Yes, we can also write the representation as $T=\sum_{|\alpha|+|\beta|} D^\alpha_x T_{x^\beta \cdot u_{\alpha\beta}}$)

The slogan in the linked notes is:

Thus tempered distributions are just products of polynomials and derivatives of bounded continuous functions. This is important because it says that distributions are “not too bad”.

This is not very/widely useful in practice, but perhaps it reduces some of the feeling of abstraction.

D.R.
  • 10,556