Tophat2 Accepted hits Visualization and HT-seq Count not agreeing
0
0
Entering edit mode
7.2 years ago

Hi all,

I seem to be have problems with Tophat2 or HT-seq counts. I am visualizing the tophat BAM file using Tablet. The trouble is, there seems to be reads aligning to this specific gene (spi-c, zebrafish, reverse strand) however HT-seq counts (using stranded = reverse) is counting 0 reads for this gene. If i use stranded = yes, HT-seq counts is giving me a count of 35. I am confused because this is a dUTP library so stranded=reverse should be used. Does this mean that HT-seq counts cannot count genes found on the compliment strand? Here is a picture of the accepted hits for this gene http://imgur.com/a/i6jIe

Thanks in advance,

RNA-Seq HT-seq Count Tophat2 • 2.0k views
ADD COMMENT
0
Entering edit mode

Could you add the full command you used for htseq count?

ADD REPLY
0
Entering edit mode

I am using the galaxy version of HT-seq Count. Here is the Job Command Line

samtools sort -n "/galaxylab/production/new/galaxy/database/files/000/dataset_392.dat" "name_sorted_alignment" && htseq-count --mode=union --stranded=reverse --minaqual=10 --type="exon" --idattr="gene_id" --order=name --format=bam name_sorted_alignment.bam "/galaxylab/production/new/galaxy/database/files/000/dataset_110.dat" | awk '{if ($1 ~ "no_feature|ambiguous|too_low_aQual|not_aligned|alignment_not_unique") print $0 | "cat 1>&2"; else print $0}' > /galaxylab/production/new/galaxy/database/files/008/dataset_8356.dat 2>/galaxylab/production/new/galaxy/database/files/008/dataset_8357.dat

ADD REPLY
0
Entering edit mode

Aren't there also many intronic reads in your screenshot? Since you are counting reads overlapping with exons....

ADD REPLY
0
Entering edit mode

Yeah, I notice that also. But I also see reads aligning to exons, therefore stranded = reverse should be giving some counts correct? Unless all these reads are being labelled as ambiguous due to overlapping another gene on the other strand?

ADD REPLY
0
Entering edit mode

Looks like you are using mode=union for the HTSeq-count. Different modes are described on the htseq-count help page here. As you suspect when a read overlaps two genes it is considered ambiguous in union mode.

ADD REPLY
0
Entering edit mode

Looking through, I know that Union is the mode I need. However how can I visualize if these reads are are overlapping 2 genes?

ADD REPLY
0
Entering edit mode

Did you run tophat with --fr-firststrand option since you have a dUTP library?

ADD REPLY
0
Entering edit mode

I already tried that, but its doesn't make a difference. I think I read somewhere that the --fr firststrand in tophat only edits XS flags?

ADD REPLY
0
Entering edit mode

If you have a stranded library then reads that are mapping to the correct strand should be ones that are counted. If you want to count every read that is mapped then you could say your library is non-stranded.

ADD REPLY
0
Entering edit mode

In tablet, how can I tell which strand a read pair is mapping to? The odd thing is that for other genes on the reverse strand, HT-seq count seems to be counting them.

ADD REPLY

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6