One read and One alignment
1
0
Entering edit mode
5.0 years ago
cxr5298 ▴ 20

I am working with a data set of ONT MinION metagenome data aligning it against a fasta file where each entry is a different species using a series of aligners (BWA, Bowtie2, SNAP, and Minimap2) to see which aligner yields the best results. However for each alinger I am getting more alignments to the specific species than there are reads in the input file.

For example a fastq containing 100,000 reads of Mouse and 50,000 reads of Wood Mouse Herpes virus aligned against my database will return 300,000 alignments for Mouse and 120,000 alignments for Wood Mouse Herpes Virus.

I understand that this is in some part due to the fact that some of these aligners report secondary and chimeric alignments but I was wondering if there wasnt a known aligner or aligner configuration wherein I could report only a single best alignment for each read in my input file?

RNA-Seq genome sequencing next-gen metagenome • 1.1k views
ADD COMMENT
0
Entering edit mode

Have you looked at the alignments to see those being produced by bwa and bowtie2 seem reasonable/logical? Look at the CIGAR strings and lengths of alignments.

minimap2 is the only bonafide aligner for long reads (not sure about SNAP) and should produce results that you should compare the others to.

ADD REPLY
0
Entering edit mode

As far as I can tell the alignments appear valid just that there's more than there should be. This phenomenon is across the board whether I'm using minmap2, bowtie2, snap, or BWA. I'm not too familiar with CIGAR strings but from what I can tell the alignments all appear to be good. If I wanted to double check the CIGAR strings how would I do that?

ADD REPLY
0
Entering edit mode

how would you treat the reads/alignments that are truly double in your reference?

ADD REPLY
0
Entering edit mode

In the case where the read aligns twice to the same species I'd consider it a single 'hit' for the species in question. In the case where a read aligns to multiple species it would be something I'd need to investigate further to determine it's origin. I'm more preoccupied with identifying origin of read than I am the specifics of its alignment to the species in question as the metagenome Im working is going to be consisting of only a handful of potential organisms.

ADD REPLY
3
Entering edit mode
5.0 years ago

You can filter your bam with samtools view -F 256 to get only the primary alignments. That will yield a one read-one line relationship.

ADD COMMENT

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6