What are basic properties of first order stochastic dominance?

Question

I have a research problem involving stochastic dominance (1st and 2nd order). I understand the definition, but I'm surprised that I've had trouble finding some basic rules governing these relationships. I would expect there would be some rules outlining basic properties of this relationship (akin to the basic rules of (say) logarithms here).

Below is an example of a basic property I think is true. I assume there are more. Does anyone know a list of these?

Example rule (addition):

If a random variable $A$ first order stochastically dominates (1OSDs) random variable $B$, then $c+A$ also 1OSDs $B$. And $c+A$ also 1OSDs $c+B$ (assuming $c$ is a positive constant).

Note: Maybe my example isn't true. My point is: This is the kind of thing I'd expect there to be some list of 10 basic properties about.

Oh yes! This can be generalized to ordering random variables and probability distributions in different ways (not just stochastic dominance). Let me see : for now, you should see the book titled "Stochastic Orders" by Moshe Shaked and George Shantikumar. I can write something up based on this resource. — Sarvesh Ravichandran Iyer, Jul 26 '21 at 03:04

Sarvesh Ravichandran Iyer · Answer 1 · 2021-07-26T05:59:26.220

Introduction

Most of the material here is taken quite liberally from Shaked and Shanthikumar's "Stochastic Orderings". But I took the liberty of adding some things that I thought would be relevant as well.

What is the idea behind ordering random variables?

Well, we order random variables for the same reason we order numbers : comparison is a natural thing to do. But the nice thing in terms of applications that I see , is that stochastic dominance allows us to prove monotonicity for a large class of functionals of random variables. Deeper notions of stochastic monotonicity are able to constrain randomness better, in the sense that the presence of a stochastic ordering links with concentration phenomena , large deviations and all other forms of probability very well.

To give one example, let me attach here my own answer about Stochastic domination involving triangles. Here I prove using stochastic domination, the monotonicity of triangle (or any shape!) counts in a graph model. As it turns out, triangle and other graph counts have been well-approximated using counting lemmas that exploit weak dependence very well.

To give another, usually stochastic dominance in controlled Markov chains (Markov chains where the transition function depends upon a "control" variable) is very important in proving the existence of monotone optimal controls, which are controls that do the best "job" and are "monotone" in a suitable sense. This is usually proved by showing that the value function is monotone, and the value function's monotonicity exploits stochastic dominance very commonly, especially in arguments from queue theory and birth-death processes.

There are also links between stochastic orders and order-preserving functions and matrices. I'll explain at the end.

Stochastic First-Order dominance

Definition : If $X,Y$ are random variables, then $X \leq_{st} Y$ if $P[X > x] \leq P[Y>x]$ for all $x \in \mathbb R$. $X$ is "first order stochastically dominated" by $Y$ if this holds.

The two key rewrites of this property instantly allow us to perceive potential generalizations, which we may then use for other purposes.

Call $U$ an "upper" set if $x \in U$ and $y>x$ implies $y \in U$. Then, $X \leq_{st} Y$ if and only if $$E[1_U(X)] \leq E[1_U(Y)] \text{ for all upper sets } U$$ The proof of this is quite clear once you see that any upper set is either an open or closed right half-line.
Another amazing rewrite is this : $X \leq_{st} Y$ if and only if $$ E[\phi(X)] \leq E[\phi(Y)] \text{ for all increasing functions } \phi $$ The proof of this is a little more subtle, where you approximate any increasing function by indicators of upper sets.

We can also prove the following amazing result, which shows precisely why first-order stochastic dominance is such a coveted order.

Strassen's theorem : $X \leq_{st} Y$ if and only if there is a random variable $Z$ and two functions $\phi_1$,$\phi_2$ such that $X \sim \phi_1(Z)$, $Y \sim \phi_2(Z)$ and $\phi_1 \leq \phi_2$.

So basically, the stochastic order is equivalent to a "pushforward" of functional order by a random variable $f$. The latter two properties are hallmarks in themselves.

Properties

Our properties will consist of closure conditions. Usually, necessity and sufficiency conditions are obtained on a case-by-case basis in papers. Therefore, it is closure that is important.

If $X \leq_{st} Y$ and $g$ is any increasing function then $g(X) \leq_{st} g(Y)$. This covers $g(x) = cx$, $g(x) = x+c$, $g(x) = e^x$ and many other examples.
If $X_i \leq_{st} Y_i, i=1 \to m$ and $\psi: \mathbb R^m \to \mathbb R$ is any increasing function (where increasing means that $\psi(x_1,...,x_m) \leq \psi(y_1,...,y_m)$ whenever $x_1\leq y_1$,...,$x_m \leq y_m$). If the $X_i$ are independent and the $Y_i$ are independent (so an $X_i$ can depend on some of the $Y_i$ but not the other $X_i$, for example), then $\psi(X_1,...,X_m) \leq_{st} \psi(Y_1,...,Y_m)$. This is also called closure under convolution when $\psi$ is taken to be the sum of the inputs.
In the above setup, if $M,N$ are integer valued random variables such that $M \leq_{st} N$ then $\sum_{i=1}^M X_i \leq_{st} \sum_{i=1}^N Y_i$.

Note that these properties are super-duper general, to the extent that some of them yield reverse characterizations as well.

Here is a sufficiency criteria :

Let $X,Y$ be random variables having densities $f,g$. If there exists a $t$ such that $f(x)>g(x)$ for all $x<t$ and $f(x)<g(x)$ for all $x>t$ then $X \leq_{st} Y$.

Nevertheless, I haven't even scratched the surface of the book.

So what can we expect from a stochastic ordering?

Looking at stochastic orderings $\leq$ in general, what we ideally wish to expect, then is exactly the following :

A nice class of functions $\mathcal F$ such that for every $f \in \mathcal F$, we have $X\leq Y$ implies and is implied by $E[f(X)] \leq E[f(Y)]$. Ideally, $\mathcal F$ should be a (closed) vector space containing the indicators of some nice sets (in the above case, upper sets). If we take a nice class of sets to begin with, we can insist that $\mathcal F$ contains all the obvious functions : $f(x)=x,x+c,cx,$ increasing functions, convex functions etc.

For example, we can allow $\mathcal F$ to be the set of all convex functions, and we get a certain order. We allow just convex increasing functions, and we get a different order. Now most of these orders will admit various properties based on the class $\mathcal F$ (for example, the product of increasing functions is increasing, but this is not true for convex functions, so convex order would lose some multiplicative properties).

But essentially, there's no list of properties that a stochastic ordering MUST have. You can define whatever order you want, based on the class $\mathcal F$. Then you can decide, based on other properties of $\mathcal F$, what properties this stochastic order inherits.

Basic properties that govern such common classes of functions $\mathcal F$, would then govern stochastic orders, but it's not necessary that every stochastic order must have any of these properties. For example :

Closure under summation and non-negative scaling (this is almost always true , to be fair).
Closure under scaling or composition by a suitable function (ideally an increasing or convex function).
Closure under multiplication.

Admittedly, Shaked and Shanthikumar go a step further, and introduce bivariate characterizations of stochastic orders, which provide possibly the ultimate link between properties of function classes and those of random variables.

At the heart of the matter, stochastic orders serve to "push" the structure of the function class onto the space of random variables. I hope I have been able to give a gist of properties that stochastic orders ideally follow.

Oof, I just forgot about matrices!

Well, roughly speaking a discrete probability distribution (let's say on finitely many points for now) is a vector, for all it matters. For example, a Bernoulli$(p)$ random variable is basically representable by $[p,1-p]$.

Therefore, matrices come in as natural transformations that preserve and play around with probability distributions. More precisely, these matrices are referred to as "stochastic" matrices. Now, let $v_1,v_2$ be probability vectors with $n$ entries and let $X,Y$ be distributed on $\{1,...,n\}$ like $v_1,v_2$ respectively with $E[X] = E[Y]$. It can be proved that $X \leq_{st} Y$ if and only if there is a doubly stochastic matrix $M$ such that $Mv_2 = v_1$. This is equivalent to $X$ being a "mean preserving spread" of $Y$.

Wow, what really helpful answer. Thank you! I think that the result on increasing functions is especially helpful. Question: If I wanted to find the full proofs, should I check out the book you mentioned (Shaked and Shanthikumar's "Stochastic Orderings")? — ABC, Jul 27 '21 at 14:11
@ABC Good point : I must do this. Most of this is in Shaked and Shanthikumar (which somehow I didn't find on the cite option!) so I'll edit with the relevant theorems. Yes, the full proofs are available there! The last theorem, let's say I couldn't quite find a proof of it but it's quite easy so I'll add a proof later. Thanks for the feedback. Actually, most of the latter proofs follow from the "coupling" criterion which I haven't mentioned for the stochastic order (and I should!) so I'll edit that in as well. — Sarvesh Ravichandran Iyer, Jul 27 '21 at 14:13
@TeresaLibson: Regarding the increasing function rewrite, I'm accustomed to thinking of "increasing functions" having real number inputs. I saw that you wrote $\phi(X)$ (meaning the input to $\phi$ is a random variable).
Do I need to change my thinking around what an increasing function is? Does \phi(x) = 3x count as an increasing function that can be used in the rewrite? Can I just use any 'ole increasing function from calculus and use the typical approaches to proving it's increasing? I'm not sure if "increasing functions of random variables" have new requirements. — ABC, Jul 27 '21 at 14:44
@TeresaLibson BTW thank you and let me know if I should start a new question around these followups. — ABC, Jul 27 '21 at 14:44
@ABC I don't think you need to start a new question, we can keep this one. So you don't need to change your thinking of an increasing function. $\phi(X)$ is a random variable, defined on the sample space $\Omega$ by $(\phi(X))(\omega) = \phi( X(\omega))$. So you first see the randomness via $X$, then send that randomness into $\phi$. Yes, $\phi(x) = 3x$ counts. Any typical increasing function from calculus can be used! (Take your pick).You can use any approach to prove that the function is increasing (derivative etc.) so you don't have to stretch at all! — Sarvesh Ravichandran Iyer, Jul 27 '21 at 14:47