Introduction
Most of the material here is taken quite liberally from Shaked and Shanthikumar's "Stochastic Orderings". But I took the liberty of adding some things that I thought would be relevant as well.
What is the idea behind ordering random variables?
Well, we order random variables for the same reason we order numbers : comparison is a natural thing to do. But the nice thing in terms of applications that I see , is that stochastic dominance allows us to prove monotonicity for a large class of functionals of random variables. Deeper notions of stochastic monotonicity are able to constrain randomness better, in the sense that the presence of a stochastic ordering links with concentration phenomena , large deviations and all other forms of probability very well.
To give one example, let me attach here my own answer about Stochastic domination involving triangles. Here I prove using stochastic domination, the monotonicity of triangle (or any shape!) counts in a graph model. As it turns out, triangle and other graph counts have been well-approximated using counting lemmas that exploit weak dependence very well.
To give another, usually stochastic dominance in controlled Markov chains (Markov chains where the transition function depends upon a "control" variable) is very important in proving the existence of monotone optimal controls, which are controls that do the best "job" and are "monotone" in a suitable sense. This is usually proved by showing that the value function is monotone, and the value function's monotonicity exploits stochastic dominance very commonly, especially in arguments from queue theory and birth-death processes.
There are also links between stochastic orders and order-preserving functions and matrices. I'll explain at the end.
Stochastic First-Order dominance
Definition : If $X,Y$ are random variables, then $X \leq_{st} Y$ if $P[X > x] \leq P[Y>x]$ for all $x \in \mathbb R$. $X$ is "first order stochastically dominated" by $Y$ if this holds.
The two key rewrites of this property instantly allow us to perceive potential generalizations, which we may then use for other purposes.
Call $U$ an "upper" set if $x \in U$ and $y>x$ implies $y \in U$. Then, $X \leq_{st} Y$ if and only if $$E[1_U(X)] \leq E[1_U(Y)] \text{ for all upper sets } U$$ The proof of this is quite clear once you see that any upper set is either an open or closed right half-line.
Another amazing rewrite is this : $X \leq_{st} Y$ if and only if $$
E[\phi(X)] \leq E[\phi(Y)] \text{ for all increasing functions } \phi
$$ The proof of this is a little more subtle, where you approximate any increasing function by indicators of upper sets.
We can also prove the following amazing result, which shows precisely why first-order stochastic dominance is such a coveted order.
- Strassen's theorem : $X \leq_{st} Y$ if and only if there is a random variable $Z$ and two functions $\phi_1$,$\phi_2$ such that $X \sim \phi_1(Z)$, $Y \sim \phi_2(Z)$ and $\phi_1 \leq \phi_2$.
So basically, the stochastic order is equivalent to a "pushforward" of functional order by a random variable $f$. The latter two properties are hallmarks in themselves.
Properties
Our properties will consist of closure conditions. Usually, necessity and sufficiency conditions are obtained on a case-by-case basis in papers. Therefore, it is closure that is important.
If $X \leq_{st} Y$ and $g$ is any increasing function then $g(X) \leq_{st} g(Y)$. This covers $g(x) = cx$, $g(x) = x+c$, $g(x) = e^x$ and many other examples.
If $X_i \leq_{st} Y_i, i=1 \to m$ and $\psi: \mathbb R^m \to \mathbb R$ is any increasing function (where increasing means that $\psi(x_1,...,x_m) \leq \psi(y_1,...,y_m)$ whenever $x_1\leq y_1$,...,$x_m \leq y_m$). If the $X_i$ are independent and the $Y_i$ are independent (so an $X_i$ can depend on some of the $Y_i$ but not the other $X_i$, for example), then $\psi(X_1,...,X_m) \leq_{st} \psi(Y_1,...,Y_m)$. This is also called closure under convolution when $\psi$ is taken to be the sum of the inputs.
In the above setup, if $M,N$ are integer valued random variables such that $M \leq_{st} N$ then $\sum_{i=1}^M X_i \leq_{st} \sum_{i=1}^N Y_i$.
Note that these properties are super-duper general, to the extent that some of them yield reverse characterizations as well.
Here is a sufficiency criteria :
Let $X,Y$ be random variables having densities $f,g$. If there exists a $t$ such that $f(x)>g(x)$ for all $x<t$ and $f(x)<g(x)$ for all $x>t$ then $X \leq_{st} Y$.
Nevertheless, I haven't even scratched the surface of the book.
So what can we expect from a stochastic ordering?
Looking at stochastic orderings $\leq$ in general, what we ideally wish to expect, then is exactly the following :
A nice class of functions $\mathcal F$ such that for every $f \in \mathcal F$, we have $X\leq Y$ implies and is implied by $E[f(X)] \leq E[f(Y)]$. Ideally, $\mathcal F$ should be a (closed) vector space containing the indicators of some nice sets (in the above case, upper sets). If we take a nice class of sets to begin with, we can insist that $\mathcal F$ contains all the obvious functions : $f(x)=x,x+c,cx,$ increasing functions, convex functions etc.
For example, we can allow $\mathcal F$ to be the set of all convex functions, and we get a certain order. We allow just convex increasing functions, and we get a different order. Now most of these orders will admit various properties based on the class $\mathcal F$ (for example, the product of increasing functions is increasing, but this is not true for convex functions, so convex order would lose some multiplicative properties).
But essentially, there's no list of properties that a stochastic ordering MUST have. You can define whatever order you want, based on the class $\mathcal F$. Then you can decide, based on other properties of $\mathcal F$, what properties this stochastic order inherits.
Basic properties that govern such common classes of functions $\mathcal F$, would then govern stochastic orders, but it's not necessary that every stochastic order must have any of these properties. For example :
Closure under summation and non-negative scaling (this is almost always true , to be fair).
Closure under scaling or composition by a suitable function (ideally an increasing or convex function).
Closure under multiplication.
Admittedly, Shaked and Shanthikumar go a step further, and introduce bivariate characterizations of stochastic orders, which provide possibly the ultimate link between properties of function classes and those of random variables.
At the heart of the matter, stochastic orders serve to "push" the structure of the function class onto the space of random variables. I hope I have been able to give a gist of properties that stochastic orders ideally follow.
Oof, I just forgot about matrices!
Well, roughly speaking a discrete probability distribution (let's say on finitely many points for now) is a vector, for all it matters. For example, a Bernoulli$(p)$ random variable is basically representable by $[p,1-p]$.
Therefore, matrices come in as natural transformations that preserve and play around with probability distributions. More precisely, these matrices are referred to as "stochastic" matrices. Now, let $v_1,v_2$ be probability vectors with $n$ entries and let $X,Y$ be distributed on $\{1,...,n\}$ like $v_1,v_2$ respectively with $E[X] = E[Y]$. It can be proved that $X \leq_{st} Y$ if and only if there is a doubly stochastic matrix $M$ such that $Mv_2 = v_1$. This is equivalent to $X$ being a "mean preserving spread" of $Y$.