Soft-Clipped Vs Unmapped?
3
3
Entering edit mode
12.1 years ago
Bioscientist ★ 1.7k

quite confused about this two terminology. I'm reading Pindel, the split-read algorithm. The author seems to make use of the information of "unmapped" reads. Also there are other split-read-based algorithm, which uses "soft-clipped" reads, which are the unaligned parts of reads.

In my eyes, the two look quite similar. Say we have a 100bp read, 50bp of which cannot map while the 50bp can. Then how would BWA categorize this read? Will BWA think this is "unmapped" read since 50bp cannot be mapped; or it's "mapped" but with 50bp "soft-clipped" sequences?

Or BWA has a scoring system for mapping, which sets a threshold for distinguishing the two?

thx

edit: maybe this is related to "centeredness"? say, if breakpoint locates at 99:1; then this 99bp will be mapped with 1bp as "soft-clipped" sequences. But for 50:50, then BWA may regard it as "unmapped"

bwa • 12k views
ADD COMMENT
3
Entering edit mode
11.5 years ago
harremsis ▴ 30

I'm not an expert on read mapping and am also still trying to get to grips with it. But from my experience there are cases in which BWA reports extensively soft-clipped reads as matches. Here's an example from a paired end Illumina sequencing project:

CTCAG_6_1205_14418_171577_2     163     gi|261748867|gb|CM000804.1|     25090342        17      61S20M  =       25090377        116     TGCAGCCCCGCTTTGGTGAAAAAACAAGATAGGAACTGTTGTTGTTCAACTGTACTGTCACCTGCAGCACACACAACCTCC       bbbeeeeegggggiiighhiiiiiiiiiiihiifhiiiiiihiihhhihihihiiiggggggeeeeedddcdccccccccc       RG:Z:FCC0ACBACXX_L6_4   XT:A:M  NM:i:0  SM:i:17 AM:i:17 XM:i:0  XO:i:0  XG:i:0  MD:Z:20

As you can see in the CIGAR string 61S20M 61bp have been soft-clipped from the beginning of the read. The flag 163 (=128+32+2+1) indicates that the read was mapped (4th, i.e. "unmapped", bit is 0), paired, mapped in proper pair, second in pair and that its mate mapped to the reverse strand (check out this great site for decoding SAM bit flags).

So it seems that even with >50% soft-clipping BWA reports reads as mapped. So far I could not figure out how to tell BWA not to do that...which I would actually prefer.

ADD COMMENT
0
Entering edit mode

The mapping quality (5th field) is only 17, which equates to a 0.01995262% chance the mapping is incorrect which is quite high when you are mapping millions of reads.

ADD REPLY
1
Entering edit mode
12.1 years ago
Geparada ★ 1.5k

As I understand the terminology, It will be "mapped" but with 50bp "soft-clipped" sequences. The unmapped have no sequences mapped to the target query.

ADD COMMENT
1
Entering edit mode

I'm just curious how BWA works. the read can still be considered "mapped" even with half of the length cannot be mapped?

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6