Borrow Information Between Genes In Rna-Seq Analysis Methods
1
1
Entering edit mode
10.1 years ago
sarahmanderni ▴ 100

Hi,

Most RNA-seq analysis methods, mention to the "borrowing strength between genes" when estimating the variance. Like in this manuscript: http://genomebiology.com/2014/15/2/R29

It might be a naive question but what exactly this means? Do they try to borrow information between genes of a sample or borrow information for a specific gene within different samples? And why should this be done?

Thanks

rna-seq • 2.0k views
ADD COMMENT
5
Entering edit mode
10.1 years ago

I'll start with the "why should this be done" part. The reason we want to "share/borrow" information across genes is because we'll be performing a LOT of tests and likely have very few replicates (at least given the number of tests we'll be performing). So most anything we estimate (dispersion, log2 fold change, etc.) will end up not being that accurate since we're relying on a small number of measurements. The general way around this is to use an empirical bayes approach to temper our raw estimates by some sort of expectation derived from the data that we observed.

To illustrate the basic process I'll just use dispersion as an example. For that the steps are generally:

  1. Estimate dispersion for each gene by itself.
  2. Fit a trend to this estimate (if you read through the vignettes for any of the count-based differential expression programs, there's always a step involving a diagnostic plot of this fit).
  3. Use this as a prior.

The general result is that estimates further from the expected value will be tempered (or shrunk) toward it (the expected value from the trend line). This is good, since those are likely just due to our low N. Lastly, information can either be shared within a group (e.g., a treatment) or between all samples, depending on settings for things and what's being estimated).

Edit: I should add that instead of just fitting a trend or something like that, you can also use a distribution (whose parameters you might estimate from the values you observed) as the prior. In point of fact, this is what's often done.

ADD COMMENT
0
Entering edit mode

Thanks for the answer. For the step one: "Estimate dispersion for each gene by itself", I am not sure if I understand you or not. So, estimating the dispersion for a specific gene within a group or for all samples? And in the edit part, do you mean the parametric methods like DESeq and edgeR?

ADD REPLY
0
Entering edit mode

For the first part, it'll depend on the package and the settings you use. DESeq2 and edgeR both (I think at least) use parametric empirical bayes methods, though this is likely different than what you're thinking about (namely, that they assume a negative binomial distribution) unless you've read through the methods sections of the papers for them.

ADD REPLY

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6