0

Given a distribution A and subset of that distribution B, if we only have the mean, variance, and size of both A and B, is there a way to find the variance of A - B? If not, are there other ways to save the distribution that allow this kind of calculation?

Edit: Clarification on problem. I am taking on a very large stream of numbers. The main distribution A contains all of these values. However, each value is tagged with an attribute, denoting a separate dataset that it should be added to. Thus A is the union of all of these smaller datasets/partitions (an example would be distribution B described above). Using the update procedure Iteratively Updating a Normal Distribution, is there a way that I can find the variance of distribution A with all the values of distribution B removed without storing the distributions themselves?

georgeyiu
  • 1
  • 1
  • 2

1 Answers1

1

So what you want to say is that A is a random variable taking value in X, say, B is the random variable A conditioned on A taking some value in $Y\subset X$. The answer to your question is yes, you work out Var(A-B) = Var(A) + Var(B) - 2Cov(A,B).

so do you know how to calculate Var(B)?

The answer to your question depend on how A and B are related. At moment you only gave the distribution, this tells us nothing about their relationship.

For example $X_1$ is an exponential variable with parameter $\lambda$, $X_2$ is distributed as $X_2= X_1 | X_1>1$ or $X_2=X_1 | Y_1 >1$ where $Y_1$ is independent copy of $X_1$. In these two cases, Cov$(X_1,X_2)=0$ are likely to be dratically different.

Lost1
  • 8,085