11

I've tried to learn probability, one way or another, many times in the last 50 years, and finally settled on the Kolmogorov approach, where a random variable isn't described as something like "a roll of a die," but rather as "a measurable function from a measure space $X$ to the reals, where the (fixed) measure on $X$ has the property that $\mu(X) = 1$."

That's something concrete and mathematical that I can work with, even if my students hate it. :)

Now I'm finding that I don't really know what a "sample" is. I'm reading a paper that says "Let us consider some dataset $X = \{ x^{(i)}\}_{i=1}^N$ consisting of $N$ i.i.d. samples of some continuous or discrete random variable $x$." And I realized that I didn't know what that meant.

I went down the hall and asked a guy who does lots of probability stuff, and usually helps me clarify things, and he said "a sample...well, sampling is a process that produces outputs distributed ... well, ... it's a single value, not a whole function like the random variable $x$." (I'm being a little unfair here; I think what he said was more coherent than this, but I couldn't make sense of it.)

My question: Can someone pleaes tell me what a sample is, in formal terms? (By that I mean that it should be some familiar mathematical object -- a set, or a function on some well-described domain, or a relation, or ... you get the idea. It should not be something like "an algorithm" or "a process" or anything like that.)

Perhaps a followup question is this: Can someone recommend a book or other source from which I can read things like this? I thought that Feller might help, but didn't find it very satisfactory the last time I looked.

John Hughes
  • 100,827
  • 4
  • 86
  • 159
  • The sample is... One roll of the die! – N74 Apr 26 '16 at 19:55
  • For continuous random variables I think it only makes sense if you discretise them according to some measurement process. A sample from $X$ is $X(\omega)$ given a sample of $\omega$ in $\Omega$ according to the measure $\mathbb{P}$. But every element $\omega$ may be measure zero. – zenna May 23 '16 at 10:39
  • Also see http://math.stackexchange.com/questions/816494/what-does-it-mean-to-sample-in-measure-theoretic-terms – zenna May 23 '16 at 10:49
  • 1
    @N74 Fair point, but certainly the mathematical world has no notion of dice, or rolls, or the like! It only knows of sets, and functions, and numbers, and the like. So, what does it mean to sample, in terms of pure mathematical primitives? =) – étale-cohomology Jul 19 '17 at 10:07
  • @étale-cohomology: Did you read the accepted answer? It seems to me to adequately answer. Of course mathematically you cannot have "dice", etc., but you can model these with mathematical objects, and the accepted answer exhibits a model whose properties match those we expect a "sample" to have. – John Hughes Jul 19 '17 at 11:16
  • @JohnHughes I'm very glad I found this question because I'm studying probability and having the exact same struggles you pointer out in this question. I'm just wondering, have you found a book or something that does NOT gloss over these kinds of things and explains them clearly? – Blue Jan 14 '25 at 20:31
  • I have not found such a book (nor looked hard). Chill2Macht's answer suggests that a book that proves the "Ionescu Tulcea extension theorem" might be a candidate. Chasing down references to that theorem led me to Kallenberg's "Foundations of Modern Probability", which looks promising, although only for folks with a quite solid background in real analysis. – John Hughes Jan 15 '25 at 10:57

2 Answers2

9

^I think @adjan's answer and @Ian's answer above are other correct interpretations.

Generally, "sample" is more of a concept from statistics then probability.

If you have a "population" which assumes a certain property with probability defined by distribution $P$, then a sample is a vector (of arbitrary length, but ideally specified beforehand) of independent copies of a random variable $X$ distributed according to distribution $P$.

To be technical, two copies $X_1, X_2$ of an independent random variable $X$ are actually two random variables on the product space $\Omega \times \Omega$ which have the product measure (i.e. $\mathbb{E}X_1X_2 = \mathbb{E}X_1 \mathbb{E}X_2$). If $\pi_1, \pi_2$ are the canonical projections and $i_1, i_2: \Omega \to \Omega \times \Omega$ are any two injections such that $\pi_1 \circ i_1 = id_{\Omega}, \pi_2 \circ i_2 = id_{\Omega}$, then $X_1 \circ i_1 \sim X$ and $X_2 \circ i_2 \sim X$.

Sampling a predetermined, fixed size $n$ of independent "samples" from the total population is really just considering the product space $\Omega^n$ with random variables $X_1, \dots, X_n$ endowed with the product measure, thus $X_1 \circ i_1 \sim \dots \sim X_n \circ i_n \sim X$.

If we don't want to fix in advance the number of samples we take, then sampling is just considering the product space $\Omega^{\mathbb{N}}$ with random variables $X_1, \dots, X_n, \dots$ endowed with the product measure, thus $X_1 \circ i_1 \sim \dots \sim X_n \circ i_n \sim \dots \sim X$. Such a construction is possible in the countably infinite case by the Ionescu Tulcea extension theorem, for example, even in many cases when the conditions of Kolmogorov's extension theorem do not apply.

It is also worth noting that we don't necessarily need to assume that the "copies" of $X$ are independently distributed; using an entirely analogous definition, it would still be possible for $X_1 \circ i_1 \sim \dots \sim X_n \circ i_n \sim \dots \sim X$ without having the product measure on $\Omega^{\mathbb{N}}$ (here I am drawing an implicit distinction between "the" product measure on the product of measure spaces and an arbitrary measure on such a space, which need not equal the canonical such measure.)

Then a "statistic" is any (measurable) function of this sample vector in the finite case or sample sequence in the infinite case.

Because in practice every experiment conducted in real life only takes finitely many variables, references to sample vectors (of an arbitrary, unspecified size $n$) rather than sample sequences will often be more common in statistics-oriented literature, although the greater generality afforded (in some sense) by the limiting case $n \to \infty$ is used very frequently in the pure probability literature (e.g. when discussing random walks or results like the CLT or Law of the Iterated Logarithm). However in the pure probability literature such infinite samples are usually termed "a sequence of identical (independently distributed) random variables" rather than a "sample", even though any sequence of identical random variables is the same type of mathematical object as the unbounded version of the theoretical model of "samples" used by statisticians. Indeed, this correspondence is why the terms "sampling with replacement" and "sampling without replacement" turn up in the pure probability literature. Since the central object of study of statistics are vectors of random variables, this explains why the theory of vector-valued random variables is more prominent or prevalent in the statistical literature than other fields.

For example, we could have a population of particles with velocities distributed as $N(0,1)$. Then a sample would correspond to a vector $(X_1, \dots, X_n)$ with each $X_i$ having the distribution $N(0,1)$.

"In practice" this is the theoretical model for choosing $n$ particles from the population and noting what their velocity is.

Given this sample, we can calculate statistics, for example, the "sample mean" $(X_1 + \dots X_n)/n$.

However, this is in general different from the "mean of the population" (which is 0, the expectation of the normal distribution), since it is a random variable and hence assumes different values randomly depending on which particles we choose. In this case, however, the statistics has the optimal property that its expectation is equal to the population mean (i.e. $E[(X_1 + \dots X_n)/n]=EX$.

Chill2Macht
  • 22,055
  • 10
  • 67
  • 178
  • 4
    A random variable (in my "Kolmogorov" world) is a function on some measure space. How can you have multiple copies of it that are "independent"? Two copies of a function are in fact equal (or they're not "copies"). – John Hughes Apr 26 '16 at 22:13
  • 1
    If you want to be technical, two copies $X_1, X_2$ of a random variable $X$ are actually two random variables on the product space $\Omega \times \Omega$ which have the product measure (i.e. $\mathbb{E}X_1 X_2 = \mathbb{E}X_1 \mathbb{E}X_2$). It follows that, when $\pi_1, \pi_2$ are the canonical projections, that $X_1 (\pi_1) \sim X_2(\pi_2) \sim X$. Then sampling of independent copies of a random variable is really just considering the product space $\Omega^{\mathbb{N}}$ with random variables $X_1, \dots, X_n, \dots$ endowed with the product measure, i.e. $X_1(\pi_1) \sim \dots X_n(\pi_n)$ – Chill2Macht Aug 09 '16 at 14:00
  • $\sim \dots \sim X$. This is possible either by the Ionescu Tulcea extension theorem, for example. – Chill2Macht Aug 09 '16 at 14:02
  • 3
    "Technical" is exactly what I wanted to be, and your comment above is the just the kind of answer I was looking for. It seems to me that perhaps what you really should have said was that "if $i_1: \Omega \to \Omega \times \Omega$ is any injection with $\pi_1 \circ i_1 = id_\Omega$, then $X_1 \circ i_1 \sim X$", since $X_1(\pi_1)$ doesn't seem to make sense to me. Does this slight modification sound right? If so, and if you submit just this part as an answer, I'll happily accept it. – John Hughes Aug 10 '16 at 17:44
  • Yeah I think I made a mistake -- thank you for correcting it! – Chill2Macht Aug 10 '16 at 20:13
  • @JohnHughes I may have accidentally added more mistakes in trying to modify my answer -- let me know again please if you find something else off. – Chill2Macht Aug 11 '16 at 17:54
  • 1
    Lovely answer, thanks. My only change would be to say that $X_1 \circ I_1 =X$, since they really are the same function rather than merely having the same distribution. Yah, the only difference is on a negligible set, but why not make the stronger statement when you can? Anyhow, I really appreciate this. One more thing...it semms to me that ti deal w finitely many samples, we only need products that are cofinitely trivial...which I guess only makes sense when there's a 'zero' element. But the full omega-to-N formulation seems as if it could be overkill. Hmm. Something to ponder. – John Hughes Aug 12 '16 at 18:32
  • This is all true. For the cofinitely trivial condition, I think this can be accomplished by just returning the probability of $\Omega$ (which is 1) for cofinitely many indices. One always has independence for these indices, since we still have for any subset $A$ of $\Omega$ that $\mathbb{P}(A \cap \Omega) = \mathbb{P}(A)= \mathbb{P}(A)\cdot 1 = \mathbb{P}(A)\mathbb{P}(\Omega)$. – Chill2Macht Aug 12 '16 at 18:49
4

When discussing samples, there is a vector-valued random variable $Y$ where the real-valued random variables $Y_i$ are independent and have the same distribution as $X$. Then $\{ x_i \}_{i=1}^n$ is $Y(\omega)$ for some particular $\omega$ in the sample space $\Omega$. Sometimes when we refer to a sample we actually mean $Y$ itself rather than a particular value, it depends on the exact context.

Ian
  • 104,572
  • I like this idea, but s'pose $X: {H,T} \to {0, 1} \subset \mathbb R$, has $X(H) = 0$ and $X(T) = 1$, and the probability mass function on the domain assigns mass $0.5$ to each element. The distribution function on $\mathbb R$ has value $0.5$ at each of $0$ and $1$, and is 0 elsewhere. To have "the same distribution" as $X$, each $Y_i$ must have the same image; there are only 2 such functions (namely $X$ and $U$, where $U(H) = 1$ and $U(T) = 0$). So how can you have more than two iid samples of this distribution? The $Y_i$ clearly will not be independent, since some pairs'll be identical! – John Hughes Apr 26 '16 at 22:09
  • 2
    @JohnHughes You are correct: you need a larger probability space to make sense of such things. For example you can take the space of all sequences of $H,T$ of length $n$ and give it the uniform distribution. This is one reason that we often fix a single huge probability space at the beginning of doing anything and then just work inside of that. – Ian Apr 26 '16 at 22:39