Question

HTSeq-count paired-end --stranded=yes gives less counts than --stranded=no

0

Entering edit mode

7.9 years ago

pablominguez ▴ 10

Hi all,

I'm confused on how should I call htseq-count for my dataset concerning option --stranded [yes or no].

I'm analysing paired-end RNASeq data, at the alignment step with tophat I produced one single bam file from the two reads (R1, R2). So I have one bam file with both strands, this file is sorted by name with samtools and when calling htseq-count I firstly assumed I had to say stranded=yes, I got my counts and there were many zeros, so I run again with stranded=no and got many more read counts, less genes with zeros and genes generally with more counts. These are the numbers for both performances:

stranded=yes no feature = 49 258 937 ambiguous = 8 393

stranded=no no feature = 4 654 795 ambiguous = 1781 755

Since the number of "no feature" is considerably lower in stranded=no, should I use this tag? Is this a proof that this is right call?

So I don't know if my logic is right, I had strand specific data but since I merged them (R1 and R2) in the alignment step, I have everything together in the bam and I have to tell this to htseq-count by setting the -stranded=yes even if this is not completely right, any sense??

As another clue I run infer_experiment.py with the merged-sorted bam file from the RSeQC package and the output was:

This is PairEnd Data Fraction of reads failed to determine: 0.0012 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0827 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9161

This tells me I have strand-specific again ... so maybe I should come back to htseq-count --stranded=yes??

Any help is very welcome!! Thanks a lot!

RNA-Seq • 5.9k views

ADD COMMENT • link updated 7.9 years ago by Devon Ryan 104k • written 7.9 years ago by pablominguez ▴ 10

0

Entering edit mode

If your data is strand specific you should tell it yes for strandness, well, htseq-count default already is for strand=yes. So you dont need to specify. The issue with your too many no_feature reads maybe is related with your alignment or annotation file. Maybe is a good idea try to quantify using RSEM or Kallisto.

ADD REPLY • link 7.9 years ago by tiago211287 ★ 1.4k

0

Entering edit mode

Thanks a lot for the suggestion, I have run Kallisto and the numbers are similar to --stranded=no.

ADD REPLY • link 7.9 years ago by pablominguez ▴ 10

1

Entering edit mode

Try out Salmon, it'll infer your library prep from the raw reads in a handy text file.

ADD REPLY • link 7.9 years ago by andrew.j.skelton73 6.5k

score 3 · Answer 1 · 2016-05-23

3

Entering edit mode

7.9 years ago

Devon Ryan 104k

You want -s reverse in htseq-count (or -s 2 in featureCounts). That's also what the output from infer_experiment.py is indicating.

ADD COMMENT • link 7.9 years ago by Devon Ryan 104k

score 0 · Answer 2 · 2016-05-23

I have met the same question. both the sequence report and the RSeQC package indicate that I had strand specific data. but when i set the htseq-count --stranded=yes or featurecount -s 1 (means strand speciafic), the number of reads mapping to the ref is lower than setting the parameter with no stranded special.