Question

Read count correlation between samples

0

Entering edit mode

6.5 years ago

firestar ★ 1.6k

For DGE using RNA-Seq, what is an acceptable correlation (read counts) between samples? Is this across all samples or within treatments? What is the correlation below which samples must be discarded?

differential-gene-expression RNA-seq • 3.0k views

ADD COMMENT • link updated 15 days ago by Ram 43k • written 6.5 years ago by firestar ★ 1.6k

score 4 · Answer 1 · 2017-10-27

4

Entering edit mode

6.5 years ago

Kevin Blighe 87k

Hi, your question is vague and it would help to understand the context in which you wish to perform a correlation analysis.

If you're referring to correlation as part of sample QC, etc., then these issues are dealt with during the normalisaton process. In this regard, other parameters to consider include dispersion and coefficient of variation.

If you're referring to just testing whether or not one transcript is correlated to another between, for example, 2 treatment groups, then run cor.test() (in R), which will derive a P value from the correlation test.

Further information: If you've processed all of your samples in exactly the same way, then I would expect good correlation (upward of 0.95 with a highly statistically significant P value) between samples and using all transcripts in the transcriptome, irrespective of case-control or treatment status and irrespective, also, of whether it's raw or normalised counts. For raw, you will see slightly lower correlation.

ADD COMMENT • link 4.4 years ago by Kevin Blighe 87k

2

Entering edit mode

Very comprehensive response for such a vague question! +1

ADD REPLY • link 6.5 years ago by andrew.j.skelton73 6.5k

1

Entering edit mode

Hey Andrew - good to see you again!

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

1

Entering edit mode

Also depends on whether your samples are technical (usually higher correlation) or biological replicates.

ADD REPLY • link 6.5 years ago by grant.hovhannisyan ★ 2.6k