Filtering BAM file to include PE reads that did not map, even though their mate did
1
1
Entering edit mode
5.0 years ago
nute11a ▴ 30

Hello!

I am trying to filter my BAM file (paired-end read alignment) to 1) Remove pairs that did not map and 2) Contain reads where both reads mapped in their proper pair, and one read mapped but one did not (include the unmapped read if its pair mapped properly) 3) Minimum mapping quality of 10

I have been referring to these pages for for help:

The "Mate-Unmapped PeterpanWendy" reads are what I want: http://gatkforums.broadinstitute.org/gatk/discussion/7067/ahoy-mates-be-ye-unmapped

From these, it seems like I want -f 11 to keep: read paired, read mapped in proper pair, mate unmapped http://www.samformat.info/sam-format-flag http://broadinstitute.github.io/picard/explain-flags.html

Therefore, my samtools filtering command would look like this: samtools view -b -F 4 -f 11 -q 10 FILENAME

I am unsure of whether my interpretation of these filters is correct, and I am also not certain how to check my filtering. Possibly one way is to search the BAM (or SAM file) to see if one read mapped, its mate has a MAPQ of 0 with a * in the CIGAR field like GATK suggests the unmapped read has. Because I'm imposing a mapping quality threshold and removing unmapped reads, I want to be sure that I kept the unmapped mate.

Thank you so much! I appreciate your comments.

BAM samtools filter mapping read • 2.4k views
ADD COMMENT
0
Entering edit mode

Alternatively reformat.sh from BBMap suite has following filter options that will help.

$ reformat.sh in=your.bam out=filtered.bam [options below]

mappedonly=f            Toss unmapped reads.
unmappedonly=f          Toss mapped reads.
pairedonly=f            Toss reads that are not mapped as proper pairs.
unpairedonly=f          Toss reads that are mapped as proper pairs.
primaryonly=f           Toss secondary alignments.  Set this to true for sam to fastq conversion.
minmapq=-1              If non-negative, toss reads with mapq under this.
maxmapq=-1              If non-negative, toss reads with mapq over this.
ADD REPLY
1
Entering edit mode
5.0 years ago

http://gatkforums.broadinstitute.org/gatk/discussion/7067/ahoy-mates-be-ye-unmapped

This tutorial is so typical of the GATK philosophy of bombarding readers with lots of useless and confusing terminology, yet not providing practical instructions on performing the task.

It is quicker to to use the samtools flags to have the flags explained

samtools flags 11
0xb 11  PAIRED,PROPER_PAIR,MUNMAP

the command:

samtools view -b -F 4 -f 11 -q 10 FILENAME

looks right, though I will say no one is quite sure what PROPER_PAIR means, when is it set and when is it not set. It depends on the aligner.

Do also note that this will not keep alignments for the point of view of the unmapped pairs. Perhaps these are not needed. These would be unmapped alignments where the mate mapped.

In addition, in my opinion, you should filter for primary alignments (in world of the SAM format these will be the alignments that are not SUPPLEMENTARY and not SECONDARY).

samtools flags SECONDARY,SUPPLEMENTARY
0x900   2304    SECONDARY,SUPPLEMENTARY

To check your SAM file you may filter your file for alignment where both reads map. See if you find any. Then check to see if the counts add up.

The interaction between SAM flags and the operations on them are trickier than one assumes.

ADD COMMENT
0
Entering edit mode

Thank you for your very helpful response.

"Do also note that this will not keep alignments for the point of view of the unmapped pairs. Perhaps these are not needed. These would be unmapped alignments where the mate mapped."

I'm unclear about this -- I do intend to keep the unmapped mate, if its read mapped.

I will think about incorporating -F 256 would get rid of everything that is not a primary alignment.

Thanks again!

ADD REPLY
0
Entering edit mode

I believe you will then need to have a separate run where you select all the alignments where the read is unmapped and the mate is mapped (not unmapped)

-f 4 -F 8

this will then select into another bam file and then you merge the two bam files.

real case scenarios like yours demonstrate how shortsighted and prematurely optimized the SAM flag format truly is.

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6