Question

Comparing stranded reads to unstranded reads

0

Entering edit mode

5.3 years ago

piyushjo ▴ 700

Hi,

I have two datasets from different sources. Unfortunately one group have done unstranded RNA-seq while the second one has done stranded. When I do the PCA analysis of normalized reads using DESeq2, I see them clustering far from each other. Now I am doubtful if there is an artefact coming from the unstranded reads of the first group or is the difference real. Could anyone enlightment me if it would be appropriate to use these two datasets for comparisons for differential gene expression or will get wrong information for transcripts on the reverse strand?

Thanks.

RNA-Seq DE gene strand information • 2.4k views

ADD COMMENT • link updated 5.3 years ago by johnsonnathant ▴ 120 • written 5.3 years ago by piyushjo ▴ 700

score 1 · Answer 1 · 2019-01-15

1

Entering edit mode

5.3 years ago

swbarnes2 14k

If the two sets of samples were prepped at different places at different times, strandedness is likely just a part of the larger batch effect.

ADD COMMENT • link 5.3 years ago by swbarnes2 14k

0

Entering edit mode

But doesn't DESeq2 takes into account the difference in library depth? What else could be contributing to variation?

ADD REPLY • link 5.3 years ago by piyushjo ▴ 700

0

Entering edit mode

Batch effect is far more than library depth. The same samples prepped in different hands will have slightly different gene expression values. That's just life in experimental science.

ADD REPLY • link 5.3 years ago by swbarnes2 14k

0

Entering edit mode

But main question is just because one library is stranded and the other is unstranded, would that make them incomparable? I understand differences from human and machines are also involved.

ADD REPLY • link 5.3 years ago by piyushjo ▴ 700

0

Entering edit mode

It depends on how you want to do the analysis. If you are looking for DE genes between both datasets, then it will be difficult to distinguish between genes that are different due to the library prep protocol or the biology of those datasets. If it is possible to mix the two data sets then do the analysis then it is more likely to come up with a decent DE gene list. This scenario would be possible if the biological question being asked is the same ie, both datasets sequenced lung cancer and normal lung. So mixing the samples would reduce the noise from the sample prep. Hope that helps.

ADD REPLY • link 5.2 years ago by johnsonnathant ▴ 120

0

Entering edit mode

Ok. Thanks. I am comparing cerebellum to medulloblastoma (cancer of cerebellum). The only thing I think bothers me is if the anti-sense transcripts for an overlapping mRNA would be improperly quantified.

ADD REPLY • link 5.2 years ago by piyushjo ▴ 700

0

Entering edit mode

Ideally, wouldn't mix the datasets, but everything would be done exactly the same. However, there is also the potential for insight if the analysis is done right as it could help highlight whether there is important information gathered from anti-sense transcripts.

ADD REPLY • link 5.2 years ago by johnsonnathant ▴ 120

score 1 · Answer 2 · 2019-01-15

It is common for library prep to be a confounding 'batch' effect factor during RNA-Seq analysis since its different therefore the selection of RNA will be different. Here is a good article (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1876-7) that will highlight some of the preparation differences. It is not surprising to me that would show up in the expression data.