Why Warning: reads with missing mate encountered is occured in HTSeq?
1
0
Entering edit mode
6.4 years ago

I was following DESeq2 manual to process my simple RNASeq paired end data that involves wil type and stress treated plant.

I ran feature count using HTSeq (version 0.9.1) with the command,

htseq-count -a 10 -s 'no' WT-CON.sam /home/exp/DESEQ2/genes.gtf > WT-DESeq.txt

I noticed a Warning: 53525476 reads with missing mate encountered.

100000 GFF lines processed.    
.
604523 GFF lines processed.
Warning: Read K00171:29:H2NYHBBXX:8:1128:23045:10019 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.   
.
.
56400000 SAM alignment record pairs processed.
56500000 SAM alignment record pairs processed.
Warning: 53525476 reads with missing mate encountered.
56509150 SAM alignment pairs processed.

A previous post and a comment by Ian highlight the sort by name (-n) option. Currently, I sort in by position and converted to SAM,

samtools sort -o WT-CON.bam /home/exp/DESEQ2/WT/accepted_hits.bam 
samtools view WT-CON.bam > WT-CON.sam

Am not sure how can I overcome the warning. Do I need to sort BAM with -n and run HTSeq again or any other parameter is missing?

RNA-Seq HTSeq alignment SAM • 9.0k views
ADD COMMENT
2
Entering edit mode

Do I need to sort BAM with -n and run HTSeq

Yes, you have to do that in order to have the aligned mates adjacent to each other. Alternatively, featureCounts can sort the reads automaticaly for you before counting. But it will be slow anyway... sorting bam is always a pain.

ADD REPLY
4
Entering edit mode
6.4 years ago
Martombo ★ 3.1k

In a bam file sorted by name the read mates are in two consecutive lines, since they have the same name. HTSeq can actually work on position-sorted bam files as well, with option -r pos, see here. That will however use more memory, as a read is kept until its mate is found. Also, you don't need to convert to sam, as HTSeq works on bam files as well. You may want to check featureCounts as well, which is much faster than HTSeq and produces the same results.

ADD COMMENT
0
Entering edit mode

Thanks for the explanation. May I know how the -n -o can be used in sorting? First I used -no after seeing a thread from SEQanswer but, it didn't seems sort correctly as I get error

 Unsorted positions on sequence #6: 5325030 followed by 5325016
 samtools index: failed to create index for "MUT.bam"

I ran command like,

samtools sort -no WT.bam /home/exp/DESEQ2/WT/accepted_hits.bam
ADD REPLY
1
Entering edit mode

Was the sorted bam file created correctly? Depending on your version of samtools, that might be an outdated syntax. You should use samtools sort -n -T /tmp/aln.sorted -o aln.sorted.bam aln.bam, see here.

ADD REPLY
0
Entering edit mode

Yes, I came to know that -T option after checking the manual. Actually I was referring to an old one. Thanks again for your help!!

ADD REPLY
0
Entering edit mode

ah, sorry, I didn't realize 5 days had passed already ;)

ADD REPLY

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6