Sorting samtools output by flag or grepping its tags
1
0
Entering edit mode
6.7 years ago
BPors ▴ 60

Hi!

I am getting confused with couple samtools outputs. I have run bwa-mem with -M flag to retrieve chimeric reads with 256 flag, and converted the sam file to bam file. Problem is, in the original bam file, if I want to count the lines with 256 flag, and if I want to retrieve "SA" tags, the numbers do not add up. Should not they be equal if they are both to show chimeric reads?

Example:

samtools view -h myfile.bam | grep 'SA:Z' > myfileSA_filtered.bam 

samtools view -f 256 -c myfile.bam 
1987783

and

samtools view -c myfileSA_filtered.bam
300281

I expect these numbers to be the same but they are quite far away from each other. I would appreciate your help!

samtools flag grep chimeric • 5.5k views
ADD COMMENT
1
Entering edit mode

I think the 256 flag only means that it is not a primary alignment. This can then be a chimeric one that has been tagged as 256 due to the -M option, but also a multimapping read. If reads multimap, then one is randomly chosen to be primary, all others get flagged as secondary. That means SA:Z is only a subset of 256, and therefore, the number for the grep is smaller than for counting 256.

ADD REPLY
0
Entering edit mode

Yes, followed the same thought, but then I cut the QNAME part, sorted and counted the unique ones in 256flagged bam file. Then I counted the unique QNAMEs in myfileSA_filtered.bam, and the numbers were still different

ADD REPLY
0
Entering edit mode

first: how do know there is no 'SA:Z' substring in the quality string ?

ADD REPLY
0
Entering edit mode

Because I did not only use SA:Z, it also includes the accession number of the genome, therefore making it specific to its chimerics

ADD REPLY
0
Entering edit mode

second: 256 if for 'secondary alignment' = an alternative alignment for the current read. I think you want 2048 'supplementary alignement'

ADD REPLY
0
Entering edit mode

well, I wanted to retrieve the chimeric reads, so I thought to grab the ones which have another place to be. I have used bwa-mem -M -a to get all alignments, but since I have used -M, as a flag, I used 256 instead of 2048. Would it be a wrong choice?

ADD REPLY
1
Entering edit mode

256 would be used for a read mapping a duplicated gene: a read could have a two possible position (primary + not-primary=256 alignement)

a first part of the read is mapped on a region while the other part is clipped. There is a second alignment (2048) for this second part, mapping another region of the genome.

ADD REPLY
1
Entering edit mode
6.7 years ago
jkbonfield ★ 1.2k

SA:Z: tags appear on the primary reads as well as supplementary ones, therefore the count is higher. Filtering by flags is the best approach. (Hint "samtools flags" give you a handy chart of the numbers.)

ADD COMMENT

Login before adding your answer.

Traffic: 3285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6