Principled structural variation detection with assembled genomes or pacbio reads
2
1
Entering edit mode
8.4 years ago
hbw ▴ 90

I want to compare my assembled genome with a reference for structural variation. Most SV methods map resequenced paired-end reads to the reference such as SVdetect, BreakDancer etc. However, my main source of long range information are pacbio reads. I have also done a hybrid assembly. I have two questions:

1. Is there a principled way and established pipeline to use pacbio reads for structural variation detection? I know raw PB reads will be problematic because of their high error rates. One approach is to map raw reads to region of interest, do a local assembly and polishing. Is there an established pipeline for this.

2. A different approach would be to use assembled genomes. I can compare the genomes with MUMmer. Is there a standard software people use for taking the MUMmer output and getting a list of structural variations? Is there a way to gain confidence in terms of what is misassembly vs. structural variation? I know Sibelia uses whole genome alignment but it is advertised for microorganisms. Would it work for more complex genomes like plants?

genome Assembly structural-variation pacbio SMRT • 3.3k views
ADD COMMENT
0
Entering edit mode

Tools that uses PacBio to identify SVs doesn't require correction. That means they should handle the high error rates in the reads (If you know any tool that uses corrected reads let me know).

ADD REPLY
0
Entering edit mode
8.1 years ago
ivminkin • 0

Hi,

Sibelia is useful for datasets < 500 MB. A version for larger genomes is in progress now, probably will be released in a year or sooner. You can take a look at Cactus as an alternative: https://github.com/benedictpaten/cactus

ADD COMMENT
0
Entering edit mode
4.2 years ago
Manish ▴ 10

I guess it is too late to answer the original question, but still will answer for anyone else who want to find variations from assemblies.

We developed a method, SyRI, to identify structural differences from whole-genome assemblies using alignments as inputs. It identifies syntenic (conserved) as well as structurally rearranged (inversions, transpositions, translocations, segmental (distal) duplication, tandem duplication) region. It also reports local variations (SNPs, indels, CNVs) within synteny and structural rearrangments providing hierarchy of variations.

More information is in paper and the method is on github.

It can work with large assemblies, but it does not differentiate between misassembly and structural variation.

ADD COMMENT

Login before adding your answer.

Traffic: 3138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6