bcftools consensus: include the OVERHANG
2
0
Entering edit mode
8.7 years ago
Marvin ▴ 220

I have e.g. a read that overhangs my reference.

It's CIGAR string is e.g. 10S40M.

My goal is to call a consensus that is longer than the original reference, meaning that the 10 soft-clipped bases are taken into account when building the consensus. There might be more than 1 read that overhangs the reference, so I cannot just add the soft-clipped bases to the reference manually.

My approach follows 2 ideas:

1) Modify the CIGAR string, so that 10S40M becomes 50M. I have also tried 10I40M.

2) Add x N's (or gaps "-") at the start of the reference. x is the length of the longest found soft-clip-overhang of all reads.

But using the mpileup-bcftools-call-tabix-bcftools-consensus pipeline results in a reference where I see N's (or gaps) at the beginning instead of the 10 bases that have been soft-clipped.

How do I achieve this?

bcftools consensus samtools mpileup • 2.1k views
ADD COMMENT
0
Entering edit mode
8.7 years ago

You're using the wrong tools for this purpose. What you're trying to do is called "microassembly" and you won't be able to easily trick samtools into ignoring the fact that you don't have a reference sequence for this portion. To just assemble this you might use Mapsembler (this is slow but apparently works OK) or Spades. One of me colleagues has had success with both of these doing something similar.

ADD COMMENT
0
Entering edit mode
8.0 years ago
JstRoRR ▴ 60

Is there any way, by using samtools/bcftools or programmatically, to retrieve softclipped bases along with consensus instead of using an assembler??

Any ideas?

ADD COMMENT

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6