mapped/unmapped mate with samtools
1
0
Entering edit mode
9.5 years ago
Flavia • 0

Hi,

I'm kind of newbie about BWA and I have a trivial question.

I performed an alignment with paired reads and now I'm trying to extract some of them with samtools.

  1. mapped reads whose mate is unmapped (samtools view -f4 -F8 in.bam)
  2. unmapped reads whose mate is mapped (samtools view -f8 -F4 in.bam)

The problem is that I did expect that the same number of reads in both files (they are complementary, aren't they?). But I obtained more reads among the unmapped.

I can't find any explanation. thanks in advance

samtools mappedunmapped pairedreads bwa • 11k views
ADD COMMENT
0
Entering edit mode

FYI, -f 4 -F 8 will give unmapped reads with mapped mates and -f 8 -F 4 will give mapped reads with unmapped mates.

What's the output of samtools view -c -f 0x800 in.bam

ADD REPLY
0
Entering edit mode

Oh sorry..in the post I inverted the code for mapped/unmapped!!

However ..with -f 0x800 I obtain 0.. but I don't know the meaning

ADD REPLY
0
Entering edit mode

BWA can produce chimeric/non-linear alignments, which will have the 0x800 bit in the flag set. I had hoped that that was the cause of the mismatch, but I guess not.

BTW, what sort of difference in number of reads are we talking about here. Just 1 or two or a significant number?

ADD REPLY
0
Entering edit mode

the unmapped mates are 1453574; the mapped mates 1418724. so I have a difference of 34850..quite significant, I think.

ADD REPLY
0
Entering edit mode

did you filter the BAM after aligning with bwa ?

ADD REPLY
0
Entering edit mode

I don't think. I did the alignment and then I converted sam in bam

samtools view -bS in.sam > out.bam

That's it.

ADD REPLY
0
Entering edit mode

So.. looking at the unmapped file I found (for example) this couple of reads:

HWI-ST1210:100:D19M1ACXX:6:1103:9250:71414 103 CL282Contig2 975 29 87M2I7M CL282Contig5 647 0 AAAAGGCTAAAGGACAGACCGCACTGTCAACCTCCATACCCGTGCACTCAACGCTATCAACGTACAACAACTACAAGAAACAACGATACGTTCAAC CCCFFFFFHGHHHIJJJJJJJJJJJJIJIJJJIJJJJJJJJJGHJJJJJIJJJHHHFFFFFDDEDDDDDDDDDDDDDDDCDD@BDDBBDB>ABDDD XT:A:U NM:i:5 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:2 MD:Z:89G1G1A0

HWI-ST1210:100:D19M1ACXX:6:1103:9250:71414 151 CL282Contig5 647 29 55M45S CL282Contig2 975 0 ATGACAATCCACCAGCACTGCCACAAAACACATTAAGAAATCCATCCTGAAATCAAAACTGTCTTCGCCATCGGAATCATCAATCATTTTCACCATCTTT @53C?A:5<(9DCDCCCBCBA@DFFEEEEEDHHHIIJIHGHJIHHJIIGJIGHJJIEHJIIIIHGJJJIGGGJJJIJJJJJJJIJJHGHHFBFFFF@CBF XT:A:M NM:i:8 SM:i:29 AM:i:29 XM:i:8 XO:i:0 XG:i:0 MD:Z:6G2A2A10T11C0A8A1G7

I completely miss the point..why are they among the unmapped mates?


I will try to edit this, because I can't submit new post (I'm a new user)

that's the version

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.5a-r405

and that's the code

bwa aln -t 4 -l 12 -n 4 -k2 -o 3 -e 3 -M 2 -O 6 -E 3 library_contigs_288CL.fa R1_001.paired.fq > DNAseq288_R1.sai
bwa aln -t 4 -l 12 -n 4 -k2 -o 3 -e 3 -M 2 -O 6 -E 3 library_contigs_288CL.fa R2_001.paired.fq > DNAseq288_R2.sai
bwa sampe -a 5000 -N 5000 -n 500 library_contigs_288CL.fa DNAseq288_R1.sai DNAseq288_R2.sai R1_001.paired.fq R2_001.paired.fq > outDNAseq288.sam
samtools view -bS outDNAseq288.sam > outDNAseq288.bam

thanks a lot

ADD REPLY
0
Entering edit mode

Those flags make absolutely no sense. What's the exact command you gave to bwa? Also, what version did you use.

ADD REPLY
0
Entering edit mode

You'd probably be better off using a more recent version and seeing if the incorrectly unset 0x8 bit in the flag is set properly there.

ADD REPLY
0
Entering edit mode

Does anybody know how I can get unmapped reads with unmapped mates?

Will samtools view -f8 -F8 in.bam work?

ADD REPLY
0
Entering edit mode

You should ask a new question to get input from more people. Your comment will only reach the people in this thread.

That being said, the answer is ;-) :

You need to select for both UNMAP and MUNMAP so -f 4 -f 8 ought to do it.

ADD REPLY
0
Entering edit mode
9.5 years ago

The flags are weird but may not be incorrect. There is a big "hole" in the spec there that says:

Bit 0x4 is the only reliable place to tell whether the read is unmapped. If 0x4 is set, no
assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10, 0x100 and
0x800, and the bit 0x20 of the previous read in the template.

So what that says is that if the read is unmapped the rest of the fields may still be set but we need to ignore that since the information may be incorrect. Now if one were to match the flags directly they would get a different result. I often seen tools completely ignore the above and assume that if the read is unmapped the rest of the fields will be changed accordingly. That is not the case.

To the OP: I think the correct course of action is to first remove all reads where both the read and the mate are unmapped. Then on the remaining reads the flags will work correctly. We have every right to complain we shouldn't need to do that - I mean c'mon having an unmapped read that is listed as mapping to say position xyz with a certain mapping quality - yes that actually can happen.

ADD COMMENT
0
Entering edit mode

Well, 0x8 is incorrect, which is the source of the problem here.

ADD REPLY
0
Entering edit mode

that could be - I have some duties right now so I can't look into it more deeply - from experience I found that when counts don't match up it is almost always because the filtering ignores the 0x4 state and matches the flags exactly as we tell it to. But those flags can actually be set 'incorrectly'.

ADD REPLY

Login before adding your answer.

Traffic: 2280 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6