What Approach Would You Recommend For Large Indel Detection With Solid Data
1
5
Entering edit mode
11.1 years ago
r.follador ▴ 90

I've been spending quite some time on following problem: I sequenced a bacterial genome using paired-end reads (SOLiD) and I have a quite good reference sequence. My goal is to detect changes in the sequenced sample compared to the reference sequence.

The detection of SNPs and small indels (a couple of bp long) is quite straightforward using the standard tools (SAMtools, GATK). However I'm stuck on the task of detecting larger Indels (tens to hundreds of bp). I tried several software and stuck with Pindel (upon a recommondation on this forum).

Because I didn't know wether to trust Pindels output, I started to simulate some data (introducing indels of several sizes into the reference), mapping the original data to the reference and checking wether Pindel was able to detect the changes. Pindel is very sensitive and could detect most of those indels, however its sensitivity is also the main problem. I find it quite impossible to differentiate between true indels and false positives. There is no good statistic regarding the significance of an observation other than the raw number of supporting reads.

My questions:

  • What does someone having more experience in this kind of work recommend? Any other software tools? Another approach? Or will I have to accept the fact that paired end short reads are not optimal to answer this kind of question?
  • For the next time: What sequencing approach would you recommend? 454 reads with de novo assembly and subsequent comparison of the contigs to the reference? PacBio? Does Illumina offer a better approach?

Thanks for any help!

indel structural variation • 4.9k views
ADD COMMENT
5
Entering edit mode
11.1 years ago
William ★ 5.3k

Read this review paper from 2011 on Structural variant calling:

http://www.ncbi.nlm.nih.gov/pubmed/21358748

Basicly there are 3 signals you can use for structural variant calling:

1) discordant pair signal

2) readdepth signal

3) split read signal or ( with as a special case denovo assembly split contig mapping signal)

The discordant pair signal and readdepth signal you can get from paired sequencing data produced on all platforms. To use the split read signal you nead long reads and do split mapping of these reads, this is not really usefull on solid data or other short read sequences.

A good discordant pair signal SV caller is breakdancer.

A good split read signal SV caller is pindel.

A good readdepth signal SV caller is cnvator.

2 upcomping multisignal SV callers are SVMiner and Lumpy

ADD COMMENT

Login before adding your answer.

Traffic: 3067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6