Comparing Rpkms For One Test Sample Vs Multiple Controls
1
0
Entering edit mode
10.5 years ago
Travis ★ 2.8k

Hi all,

I have RPKM values for a single sample (lung adenocarcinoma) and wish to compare it to RPKM values for a group of controls (50 TCGA normal lung samples).

Bearing in mind the one to many nature of the analysis, and RPKMs being the starting point, can someone recommend the best method/software for calculating differential expression with some appropriate measures of significance? At its most basic I have calculated fold changes and Z-scores (mean and median based) but I am guessing this is overly simplistic.

All help appreciated.

rna-seq differential-expression next-gen • 3.6k views
ADD COMMENT
0
Entering edit mode
10.5 years ago
Hayssam ▴ 280

Hi, I don't think there's any reason not to start by using one of the available differential expression test in R. I'd recommend edgeR or DESeq. Both have nice tutorials to get you started and both should handle the class imbalance adequately. However these two methods expect raw reads counts, not RPKM. For the TCGA samples, raw counts are available but you have to take level 2 I think. Is there any reasons for you to stick with RPKMs? If yes, be aware that you risk of loosing statistical power by using them.

ADD COMMENT
0
Entering edit mode

I had assumed it would not be safe to take raw counts from different sources/centers and attempt differential expression analysis. Do both DESeq and edgeR attempt to correct for issues like differences in sequencing depth?

ADD REPLY
0
Entering edit mode

Different library sizes (due to both different sequencing depth and different ratio of mappable reads) are exactly the raison d'ĂȘtre for these approaches. There's several papers explaining why RPKM is not appropriately dealing with that. See e.g. Differential Gene Expression Analysis - Rpkm Vs Readcount and Rnaseq Differential Expression. About RPKM inconsistencies, you can have a starting look with this blog post.

Furthermore, if you suspect there's some batch effects (e.g. a lab effect for samples coming from different centers), linear modeling in edgeR can help you to correct/account for this. There's a large scale RNA-sequencing effort that got a study published recently and that adequately dealt with batch effects. If that's interesting for you, you could start browsing from the GEUVADIS RNA-Seq website.

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6