Ito's lemma 2nd order term notation.

Question

I have a notation question here.

In the simplest form of Ito's lemma, we have this

$ df(Y_t) = f'(Y_t) dY_t + \frac{1}{2} f''(Y_t) d\langle Y \rangle_t$

I know how to calculate the $ d\langle Y \rangle_t $ term, but I always want to ask

what is the name of the term, and what exactly it means?
why is it written in such a special way but not using $ Cov() $, or $ Var() $?

Conceptually to me that's the variance of the process but I just don't understand the notation. Why the subscript $ t $ is being put outside the $ \langle \cdot \rangle $.

Can I write it like any of these below?

$ \langle dY_t \rangle $

$ d \langle Y_t \rangle $

If there are two processes involved, following the pattern I guess it should be written like this $ d\langle X, Y \rangle_t $, but can I write it like these below?

$ \langle dX_t, dY_t \rangle $

$ d\langle X_t, Y_t \rangle $

Also can I write it in integral form? Where should I put the $ t $ if I am writing it in integral form?

Thanks a lot

Think of it this way: if you had some weird symbol like $AQZ$ for the name of a stochastic process, you would write the differential of that process as $dAQZ_t$. That is you write $d$, the name of the process, and then the subscript $t$. In this context here, $\langle Y \rangle$ is the name of the quadratic variation process, and $\langle X,Y \rangle$ is the name of the quadratic covariation process. Of course you can adjust notation however you need, but this (or the same thing but with square brackets) is the usual way people write things. — Ian, Dec 28 '20 at 16:09
Note that the quadratic variation is in general not the same thing as the variance. As a trivial example, let $X$ be any random variable with finite variance and let $B_t$ be a Brownian motion independent of $X$ and $Y_t=X+B_t$, then $\operatorname{Var}(Y_t)=\operatorname{Var}(X)+t$ but the quadratic variation of $Y_t$ is still $t$. Thus it would not be correct to write the quadratic variation with Var() notation. — Ian, Dec 28 '20 at 16:13
Thanks a lot Ian. Finally I see the term being called quadratic variation process and quadratic covariation process. Fantastic explanation of the difference between variance and quadratic variation. Thanks!! — Paul, Dec 28 '20 at 16:16
@Ian I have also wondered about it. I have a follow-up question. It is true that $\langel Y\rangle_t\neq Var(Y_t),$ however we do have $d\langle Y\rangle_t=dVar(Y_t).$ Therefore, I don't see why one can't replace $d\langle Y\rangle_t$ by $dVar(Y_t).$ Or, am I missing something? — Raghav, Dec 28 '20 at 16:46
@WhoKnowsWho That's true in my previous example, but an example without that property would be an SDE with random drift and no volatility, e.g. $y'=X$ where $X$ is a single N(0,1) random variable. Then the variance is of course $t^2$ but the quadratic variation is zero. — Ian, Dec 28 '20 at 16:56
@WhoKnowsWho A more direct example would be to just take almost any particular BM path, in which case the quadratic variation is still $t$ but the variance is now zero (since we've frozen which path we're talking about). — Ian, Dec 28 '20 at 17:09

Jan Stuller · Accepted Answer · 2021-01-10T08:35:51.520

Long-hand / Short-hand notation:

I personally have always found the short-hand notation confusing and to this day try to avoid it whenever possible. Below, I will try to demonstrate why it is confusing and leads to commonly made mistakes.

In the "long-hand" notation, an Ito process $X_t$ is defined as follows:

$$X_t:=X_0+\int_{h=0}^{h=t}a(X_h,h) dh + \int_{h=0}^{h=t}b(X_h,h) dW_h $$

Above, $a(X_t,t)$ and $b(X_t,t)$ are some square-integrable processes.

It is worth noting that the Quadratic variation of $X_t$ would then be:

$$\left<X\right>_t=\int_{h=0}^{h=t}b(X_h,h)^2dh $$

(this follows from the definition of Quadratic variation for Stochastic Processes, see edit at the end of this post)

Now, in short-hand notation, we can write the equation for $X_t$ above as:

$$dX_t=a(X_t,t) dt + b(X_t,t) dW_t$$

Firstly, what does the short-hand notation really mean? We could define $\delta X_t$ as follows:

$$\delta X_t:=X_t-X_0=\int_{h=0}^{h=\delta t}a(X_h,h) dh + \int_{h=0}^{h=\delta t}b(X_h,h) dW_h$$

And then $dX_t$ could be (intuitively, not rigorously) understood along the lines of:

$$\lim_{\delta t \to 0} \delta X_t = dX_t$$

But I think it's best to just understand the short-hand notation for what it really is: i.e. a short-hand for the stochastic integrals.

Ito's Lemma:

Now Ito's Lemma states that for any such Ito process $X_t$, any twice-differentiable function $F()$ of $X_t$ and $t$ would obey the following equation:

$$F(X_t,t)=F(X_0,t_0)+\int_{h=0}^{h=t} \left( \frac{\partial F}{\partial t}+\frac{\partial F}{\partial X}*a(X_h,h) + 0.5\frac{\partial^2 F}{\partial X^2}*b(X_h,h)^2 \right)dh+\int_{h=0}^{h=t}\left(\frac{\partial F}{\partial X}b(X_h,h)\right)dW_h$$

Above, you can spot the "quadratic variation" term:

$$\int_{h=0}^{h=t}0.5\frac{\partial^2 F}{\partial X^2}b(X_h,h)^2 dh$$

(which, in "short-hand" notation could be written as $0.5F''(X_t)d\left<X\right>_t$, i.e. exactly the same as yours $0.5f''(Y_t) d\langle Y \rangle_t$, I just use $F$ instead of $f$ and $X_t$ instead of $Y_t$: again, I find the short-hand much less intuitive than the long-hand notation, even after years of playing around with Ito processes).

Why not to use Short-hand notation

Now I would like to show an example of why I think the short-hand notation can be super-confusing: Let's go with the Ornstein-Uhlenbeck process (below, $\mu$, $\theta$ and $\sigma$ are constant parameters):

$$X_t:=X_0+\int_{h=0}^{h=t}\theta(\mu- X_h)dh + \int_{h=0}^{h=t}\sigma dW_h $$

We have $a(X_t,t)=\theta(\mu- X_h)$ and $b(X_t,t) = \sigma$.

The trick to solving the above is to apply Ito's lemma to $F(X_t,t):=X_t e^{\theta t}$, which gives:

$$X_te^{\theta t}=F(X_0,t_0)_{=X_0}+\int_{h=0}^{h=t} \left( \frac{\partial F}{\partial t}_{=\theta X_h e^{\theta h}}+\frac{\partial F}{\partial X}_{=e^{\theta h}}*a(X_h,h) + 0.5\frac{\partial^2 F}{\partial X^2}_{=0}*b(X_h,h)^2 \right)dh+\int_{h=0}^{h=t}\left(\frac{\partial F}{\partial X}_{=e^{\theta h}}b(X_h,h)\right)dW_h=\\=X_0+\int_{h=0}^{h=t}\left(\theta X_h e^{\theta h}+e^{\theta h}\theta(\mu- X_h)\right)dh+\int_{h=0}^{h=t}\left(e^{\theta h} \sigma\right)dW_h=\\=X_0+\int_{h=0}^{h=t}\left(e^{\theta h}\theta\mu\right)dh+\int_{h=0}^{h=t}\left(e^{\theta h} \sigma\right)dW_h$$

Now, to get the solution for $X_t$, the final step is simply to divide both sides by $e^{\theta t}$, to isolate the $X_t$ term on the LHS, which gives:

$$X_t=X_0e^{-\theta t}+\int_{h=0}^{h=t}\left(e^{\theta(h-t)}\theta\mu\right)dh+\int_{h=0}^{h=t}\sigma e^{\theta(h-t)} dW_h$$

I have seen many people trying to solve the Ornstein-Uhlenbeck writing everything out using the "short-hand" notation, and in the last step, when we divide through by $e^{\theta t}$, I have seen people "cancelling out" the terms that would normally be written as $e^{\theta h}$ inside the integrals: because the short hand notation fails to distinguish between what is an integration dummy variable (i.e. "$h$") and what had already been integrated to "$t$".

In conclusion, I wouldn't recommend using the short hand notation for SDEs, and if you come across it, I would encourage "translating it" into what it really means (i.e. the "long-hand" notation): at least for me, it has made things a lot easier to comprehend.

Edit on Quadratic Variation: Quadratic variation for Stochastic Processes is defined as a limit in Probability as the mesh-size gets finer and finer, specifically for a Brownian motion, we could write $\forall \epsilon > 0$:

$$\left<W\right>_t:=\lim_{n \to \infty} \mathbb{P}\left(\left|\sum_{i=1}^{i=n}\left(W_{t_i}-W_{t_{i-1}}\right)^2-t\right|>\epsilon\right)=0$$

I.e. the probability that the Quadratic variation converges to $t$ goes to 1 as the mesh size gets infinitely fine (the proof is rather technical, see for example here, where they actually seem to prove convergence almost surely (which implies convergence in probability)).

Notice that we can then simply write:

$$t=\int_{h=0}^{h=t}dh$$ and thereby obtain the well-known formula:

$$ \left< W \right>_t=\int_{h=0}^{h=t}dh=t$$

I would probably not say that the quadratic variation is defined that way...quadratic variation is already a concept that makes sense for just functions, and happens to simplify to the expression you gave in the context of an Ito process. I also would nitpick that you should probably stick with the OP's notation for the quadratic variation process i.e. $\langle X \rangle_t$ rather than $\langle X_t \rangle$ (not necessarily because it's better or worse, just because it's what they already introduced). Otherwise this is a great answer. — Ian, Dec 28 '20 at 22:07
Thank you @Ian, I've amended the notation for $\langle X \rangle_t$ as suggested. Let me have another look at the definition of quadratic variation and make an amendment accordingly, it's a good point you raise. PS: if you find the topic of Stochastic Integration interesting, I have put a bounty on this question here, if you take interest. — Jan Stuller, Dec 28 '20 at 22:29
@Ian: I've added a section at the end on Quadratic Variation for Stochastic Processes, actually served as a good refresher, thank you for pointing out earlier that indeed the definition is not what I had written, but it rather follows from the actual definition. I went with a definition specifically for Stochastic processes, rather that with the more general definition for Deterministic functions: hopefully that's ok in terms of rigour. — Jan Stuller, Dec 29 '20 at 11:56
What you actually wrote there is convergence almost surely, which does indeed hold for BM. But yes, that helps for sure. — Ian, Dec 29 '20 at 15:25
@Ian: thanks. I believe that for A.S. convergence, the limit would have to be inside the probability. When the limit is outside the probability, it happens to be convergence in probability (by continuity property of Probability measure, the limit can be taken inside the probability for increasing or decreasing sequence of events, and I believe that for the sequence of events defined as $$A_n:=\sum_{i=1}^{i=n}\left(W_{t-i}-W_{t_{i-1}}\right)^2$$, the continuity applies, therefore A.S. convergence and convergence in probability both hold for the Quadratic Variation of $W_t$. — Jan Stuller, Dec 29 '20 at 15:40
Well for convergence in probability, the thing inside the $\mathbb{P}$ would not be just equality. — Ian, Dec 29 '20 at 15:44
@Ian: for my own education, I always thought that the following two statements are equivalent of "probability convergence":
$$\left<W\right>t:=\lim{n \to \infty} \mathbb{P}\left(\left|\sum_{i=1}^{i=n}\left(W_{t_i}-W_{t_{i-1}}\right)^2-t\right|>0\right)=0$$

And

$$\left<W\right>t:=\lim{n \to \infty} \mathbb{P}\left(\sum_{i=1}^{i=n}\left(W_{t_i}-W_{t_{i-1}}\right)^2-t=1\right)=1$$ — Jan Stuller, Dec 29 '20 at 15:50
The right definition of "$X_n$ converges in probability to $X$" is for every $\epsilon > 0$, $\lim_{n \to \infty} \mathbb{P}(|X_n-X|>\epsilon)=0$. There is no getting rid of that quantifier; the funny non-uniformity of the expression (where convergence in probability basically means send the probability to zero before sending $\epsilon$ to zero) is part of how the distinction between convergence in probability and convergence a.s. actually works. — Ian, Dec 29 '20 at 16:31

Ito's lemma 2nd order term notation.

1 Answers1

Long-hand / Short-hand notation:

Ito's Lemma:

Why not to use Short-hand notation

Linked