Filter sam/bam files
2
0
Entering edit mode
5.8 years ago
second_exon ▴ 210

Hi:

I mapped single-end sequenced GBS reads onto two similar genomes. I have two sam files one for each genome. I'm interested to filter sam/bam files based on these conditions:
1. reads unique to a genome/reads mapped only on one genome
2. reads are not unique/mapped on both genomes.

Are there any existing implementations for this? I want to know before I go ahead and implement this in pysam.

GBS sam bam filter samtools • 1.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Were the reads mapped to each genome separately? I don't think there is a tool that does what you want, but BBSplit from BBTools comes close - you would have to do the mapping again

If you end up coding your own solution, I think it would be wise to sort the sam / bam by name, so parsing both files simultaneously is simpler and uses less memory.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion! Yes, they were mapped separately. I will look into BBTools.

ADD REPLY
3
Entering edit mode
5.8 years ago

your reads are single end ?

samtools view -F 4 in1.bam | cut -f 1 | sort | uniq > r1.txt 
samtools view -F 4 in2.bam | cut -f 1 | sort | uniq > r2.txt

reads mapped only on one genome

comm -3 r1.txt r2.txt

reads are not unique/mapped on both genomes.

comm -12 r1.txt r2.txt
ADD COMMENT
1
Entering edit mode

Thanks! It worked :)

ADD REPLY
1
Entering edit mode
5.8 years ago

I have two sam files one for each genome.

This is going to be over-stringent. Some reads will map perfectly to one genome, and map with discrepancies to the other. You probably want to count those for the first genome, not throw them away becasue the aligner forced them to align to the second genome.

Better to combine the genomes, align to that, and then you can fish out reads that aligned uniquely.

ADD COMMENT

Login before adding your answer.

Traffic: 1852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6