Samtools: Question about filtering BAM file using flag
2
0
Entering edit mode
5.2 years ago
SDin • 0

Hi there, I am trying to filter BAM file with their 'flag' column. I am a little confuse about the meanings of, for example,

samtools view -f 4 -F 264 .....

I check the documentation, and notice

-f means 'what I want' and   '4'    means 'read unmapped'
-F means  'wipe off'   and  '264'   means 'mate unmapped + not primary alignment'

My question is:

If a sequence's flag is 12, will it be extracted by '-f 4'? If a sequence's flag is 8, will it be wiped off by '-F 264'?

I am confuse about the mechanism of this code, which is unfortunately not clear in the documentation.

Many thanks.

next-gen alignment sequencing • 3.8k views
ADD COMMENT
1
Entering edit mode

See SAM Format site for explanation of the flags.

ADD REPLY
1
Entering edit mode
ADD REPLY
8
Entering edit mode
5.2 years ago

The flags are numbers in base-2. Thus a better way to think about the flags is their binary encoding.

The binary encodings of the values you mentioned are:

  4: 0 0 0 0 0 0 1 0 0
  8: 0 0 0 0 0 1 0 0 0
 12: 0 0 0 0 0 1 1 0 0 
264: 1 0 0 0 0 1 0 0 0

Here we can easily see that 12 is composed of 8 (mate unmapped) and 4 (read unmapped), while 264 is composed of 256 (not primary alignment) and 8 (mate unmapped).

As -f means retain only reads with all specified flags set, a read with the flag 12 will be retained by -f 4 because a read with flag 12 has its 4 flag set. As -F means retain only reads with none of the specified flags set, a read with flag 8 will be removed by -F 264 because 8 is one of the flags specified by 264.

ADD COMMENT
0
Entering edit mode

of course -f 4 -F 264 will exclude a read with flag 12 because means that the 8 flag is set, and the eight flag is one of the flags that -F 264 instructs samtools to exclude.

ADD REPLY
0
Entering edit mode

I got it. Many thanks for your kind help!

ADD REPLY
1
Entering edit mode
5.2 years ago

The flag values are always precise and unique because they are numerical representation of the _sum_ of the (numerical) answers to the 12 questions such as "Is the respective read mapped? Paired? ..."

See i.sudbery's answer for the technical details and perhaps play around with this page to get a better feeling for what's going on.

ADD COMMENT

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6