Hi,
We are for the first time mapping stranded RNA-seq paired reads sequenced with the Illumina TruSeq protocol. We are using TopHat2 with the fr-firststrand option and we notice that the direction of reads on the genome seems to depend on the order of the read pair files inputted for mapping. And this will also affect the gene counts later on. It looks that setting R2 reads before R1 gives the correct results. But I can't seem to find this in the TopHat manual.
Are there any standard ways of mapping paired-end stranded RNA-seq reads?
Thanks for the reply. But I am still very puzzled about this. When mapping with R1 reads first and counting with HTSeq and stranded option = yes, I get zero counts for features on the + strand. With the stranded option set to reverse I get the counts. When mapping with R2 reads first we get the exactly opposite results. With cufflinks we get fpkm-values almost identical no matter the order of input files.
I have been trying to find more information about how stranded reads are handled by TopHat/cufflinks, but without luck.
strand=reverse is the typical setting that should be used for stranded datasets with htseq-count, since you presumably have a dUTP-based library. That the "reverse" option is used rather than "yes" has more to do with what the common library types were years ago than anything else.
I see, thanks!