Question

Ht-Seq Read Count And Strand-Specificity

5

Entering edit mode

11.0 years ago

thecuriousbiologist ▴ 550

Hi,

I am new to RNA sequencing and I am a bit confused with the HT-Seq read count options and I want to know whether I am thinking in the right direction. I have a set of paired-end strand-specific RNA-Seq reads and I am now trying to count the reads in a set of features (genes).

The HT-Seq documentation says that the option "stranded" by default is set to "yes" which means that HT-Seq assumes the reads to be strand-specific. They also say

"If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data! "

This makes sense, since if I use "stranded=yes" option for non-strand specific data, the reads mapping to the opposite strand of the feature will NOT be counted.

However, this makes me wonder, if I use "stranded=no" even for strand-specific data, it would not affect my counts in any way. Is that correct ? Because with "stranded=no", it does not matter if a read maps to the same or the opposite strand as the feature. It would be counted as long as it is mapping to a feature, regardless of the strand.

So then a follow up question comes to mind as to why HT-Seq even has the "stranded=yes" and "stranded=reverse" options.

I am sorry if this is a very naive and incorrect question, but I really need to get the strand-specific concept clear in my mind.

Any help would be much appreciated.

htseq read counts strand rna • 10k views

ADD COMMENT • link updated 11.0 years ago by Ido Tamir 5.2k • written 11.0 years ago by thecuriousbiologist ▴ 550

score 6 · Answer 1 · 2013-05-05

If the transcripts/genes whatever would not overlap you would get the same results whether you specify stranded=yes or stranded=no. But sometimes exons overlap (at least in mammals), and they do this in opposite directions which allows htseq-count to differentiate between the two genes/transcripts it the input was stranded. So you should see a higher rate of ambigous reads when using unstranded.

Depending on the protocol either the sense or the antisense strand gets sequenced, which makes the reverse option necessary. A not completeley illuminating figure (a little bit more colour would have been nice to see which strand gets sequenced: A not completely illuminating figure Image Credit: Zhao Zhang

And no its not naive. It is confusing and complicated with all these strands, protocols etc...