Is There A Tool (Samtools, Picard ..) To Filter Certain Flags In Bam File ?
3
4
Entering edit mode
11.8 years ago
madkitty ▴ 690

Is there a Picard Tool to filter one or more reads through the following :

  • FLAG (second column)
  • Mapping Quality (MAPQ 5th Column)
  • TLEN (9th Column)

??

samtools bam • 9.1k views
ADD COMMENT
7
Entering edit mode
11.8 years ago
Arun 2.4k

Hypothetical scenario: Suppose you'd want to get all reads with flag 83=paired reads|first in pair|read reverse mapped| proper pair reads with mapping quality > 30 from a bam file and return output to a bam file:

samtools view -h input.bam | awk 'substr($0,1,1) == "@" || $2 == 83 || $5 >= 30 {print}' | samtools view -bS - > output.bam

How it works?:

samtools view -h input.bam - decodes bam and prints sam file with header to screen
awk 'substr($0,1,1) == "@" - keep all header lines
|| $2 == 83 || $5 >= 30 {print}        - the actual condition you would want to check. for unmapped reads you could use GAWK with bitwise operation and($2, 4) == 4 || and($2, 16) == 16, or something like that.
samtools view -bS          - pipe input S=sam to output b=bam and the "-" mediates samtools to take input from the screen
> output.bam               - simple unix redirection to output file

Basically, you can use the awk line to filter anything you want.

ADD COMMENT
7
Entering edit mode
11.8 years ago
Fedor Gusev ▴ 210

You can use BamTools with its filter utility. You need to write a .json file like this one:

{
"isMapped" : "true",
"mapQuality" : ">50",
"isPaired" : "true"
}

And run command

bamtools filter -in in.bam -out out.bam -script filter.json

See a tutorial here.

I don't see a way to extract TLEN with it, though.

ADD COMMENT
1
Entering edit mode

(+1) haven't seen that. thanks.

ADD REPLY
0
Entering edit mode

Will try it out. Thanks a lot :)

ADD REPLY
0
Entering edit mode
10.2 years ago

How can I filter on a string in a "subtag" ?

I have an annotated BAM file (from CRAC) with an 'XE' tag. For example, the tags are:

XE:Z:0:0:Error:Unknown:1:15.0049

or

XE:Z:0:0:chimera:32:12|-1,35194979:12|-1,34872998

I would like to get only the chimeras but I can't figure out how to use 'bamtools filter' correctly (in order to be quicker than a 'grep').

Anyone knows how to do ?

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6