1

I am reading a paper in the genomics field (Adjusting batch effects in microarray expression data using empirical Bayes methods. from W. Evan Johnson, Cheng Li), where they try to correct for some noise related to the experimental procedure. What they do is that they use empirical bayes to estimate batch effect parameters from the data. They model the data as follow:

$Y_{ijq}$ = $\alpha_{g}$ + X$\beta_{g}$ + $\gamma_{ig}$ + $\delta_{ig}\epsilon_{ijg}$

where $\alpha_{g}$ and $X\beta_{g}$ composes the true value of the gene expression and $\gamma_{ig}$,$\delta_{ig}$ and $\epsilon_{ijg}$ represent the additive batch effect, the multiplicative batch effect and the error term respectively. When they want to estimate the additive and multiplicative batch effect, they assume $\gamma_{ig}$ ~ N($Y_{i}$,$\tau^{2}_{i}$) and $\delta^{2}_{ig}$ ~ Inverse Gamma($\lambda_{i}$, $\theta_{i}$).

My questions are the following: 1) what is the reason one of the estimates follows a normal distribution and the other one follows a inverse Gamma distribution?

2) Generally speaking when do we use Inverse Gamma distribution? Why not Gamma distribution? What is the main difference between Gamma and Inverse Gamma distribution?

I am not a mathematician, It's the first time I see gamma, beta, inverse distributions and I'm a little bit lost.

Hope you will be able to help me.

Babas
  • 131
  • In Bioinformatics it is common to make wild assumptions about the distributions of gene expression. The inverse gamma just happens to be useful in some types of bayesian analysis methods. What I'm trying to say is most of the time distributions are used for matters of convenience in spite of the fact that genes do not follow any family of distributions they are extremely heterogeneous some look normal some look exponential some look bimodal, trimodal, etc. And even worse they usually assume independence of genes which is so far from true it's downright scary to assume that - but they do. – Gregory Grant May 30 '18 at 15:10
  • I just want to be sure, It happens that people use brute force trying multiple distributions and the inverse distribution may be the best for the model, no other reason? Thank for your answer by the way. – Babas May 30 '18 at 15:25
  • At first everybody used the Binomial or Poisson, but that didn't capture biological variability so everybody started using the Negative Binomial. Now you can read a hundred papers where everybody used something different. Nobody really knows what to do, most of these papers are written by statisticians who are out of touch with real biologists. Chances are that paper you're reading is worthless. What's the title? – Gregory Grant May 30 '18 at 15:36
  • The name of the paper is Adjusting batch effects in microarray expression data using empirical Bayes methods. from W. Evan Johnson, Cheng Li – Babas May 31 '18 at 06:36

0 Answers0