Question

Borrow Information Between Genes In Rna-Seq Analysis Methods

1

Entering edit mode

10.1 years ago

sarahmanderni ▴ 100

Hi,

Most RNA-seq analysis methods, mention to the "borrowing strength between genes" when estimating the variance. Like in this manuscript: http://genomebiology.com/2014/15/2/R29

It might be a naive question but what exactly this means? Do they try to borrow information between genes of a sample or borrow information for a specific gene within different samples? And why should this be done?

Thanks

rna-seq • 2.0k views

ADD COMMENT • link updated 8.5 years ago by Biostar 20 • written 10.1 years ago by sarahmanderni ▴ 100

score 5 · Answer 1 · 2014-03-19

I'll start with the "why should this be done" part. The reason we want to "share/borrow" information across genes is because we'll be performing a LOT of tests and likely have very few replicates (at least given the number of tests we'll be performing). So most anything we estimate (dispersion, log2 fold change, etc.) will end up not being that accurate since we're relying on a small number of measurements. The general way around this is to use an empirical bayes approach to temper our raw estimates by some sort of expectation derived from the data that we observed.

To illustrate the basic process I'll just use dispersion as an example. For that the steps are generally:

Estimate dispersion for each gene by itself.
Fit a trend to this estimate (if you read through the vignettes for any of the count-based differential expression programs, there's always a step involving a diagnostic plot of this fit).
Use this as a prior.

The general result is that estimates further from the expected value will be tempered (or shrunk) toward it (the expected value from the trend line). This is good, since those are likely just due to our low N. Lastly, information can either be shared within a group (e.g., a treatment) or between all samples, depending on settings for things and what's being estimated).

Edit: I should add that instead of just fitting a trend or something like that, you can also use a distribution (whose parameters you might estimate from the values you observed) as the prior. In point of fact, this is what's often done.