Question

1000Genome SV integrated map's power for SV filter

1

Entering edit mode

8.0 years ago

michealsmith ▴ 790

I'm trying to use 1000GenomeProject integrated map ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.vcf.gz to filter out common SVs, in order to look for rare/novel SV in disease sample.

Questions:

I guess the integrated map is NOT quite integrated, correct? Because this list contains those highly confident and validated SVs, while ambiguous ones will be kicked out. But usually SV callings from real-world data would contain many such "ambiguous" ones (results in lots of false-positive caused by repetitive sequence misalignment, etc., or systematic errors/bias from the program itself). So if we use this "integrated" map as "golden standard" for filter, we'll end up retaining many "ambiguous" false positive.

For those analyzing tumor samples, naturally you'll have controls. But I'm working on complex disease, so one solution I could think of is to run many CONTROL samples (for example, CEU controls from 1000Genome) simultaneously, and remove whatever seen in CONTROL, which hopefully removes many "ambiguous" ones.

What else solutions could I do?
I randomly pick up several SV callings, which shows up as common deletions in my CONTROL, but interestingly absent from integrated SV map; To my surprise, they are all not-conserved LINE, picture as below:

enter image description here

The deletion absent from 1000Genome Project integrated map is the gap in the middle, I'm wondering why?

Thanks

1000 genome project structural variation filter • 3.0k views

ADD COMMENT • link updated 5.4 years ago by LGMgeo ▴ 100 • written 8.0 years ago by michealsmith ▴ 790

0

Entering edit mode

Just a comment on "run[ning] many CONTROL samples": my company works on cancer but we lack normal tissue. Therefore, we use exactly your approach by removing variants which frequently occur in genome resequencing projects.

ADD REPLY • link 8.0 years ago by Manuel Landesfeind ★ 1.4k

0

Entering edit mode

The 1kg provides some of the calls that didn't make the final cut in the working directories. I'd recommend downloading those.

ADD REPLY • link 8.0 years ago by Zev.Kronenberg 12k

score 6 · Answer 1 · 2016-05-10

The 1000 Genomes phase 3 integrated SV call set was generated from 2,504 low coverage (7-9X) samples. Deletions have much higher confidence than duplications (the smallest DUP is 3kb)

If you have high coverage data you will have greater sensitivity for SV detection.

In addition, just because a SV does not overlap with 1kg does not mean it is a very rare nonpathogenic variant.

The deletion absent from 1000Genome Project integrated map is the gap in the middle, I'm wondering why?

If the variant is in the reference genome browser track then it's essentially fixed the population. 1000 Genomes reports common variants that may or may not be represented in reference builds (this is one of the goals of the project is to generate better reference builds that incorporate genetic diversity)

If you want to prioritize SVs I suggest using ANNOVAR for annotation.

If you want to apply a more systematic approach to prioritization SVtyper is a good program.

Alternatively you can try out my script for CNV gtCNV which will annotate your variants that overlap to 1000 Genomes, LINEs, STRs, MEIs, genes, and segmental duplications (low copy repeats).

score 0 · Answer 2 · 2018-07-09

0

Entering edit mode

5.8 years ago

LGMgeo ▴ 100

I suggest using AnnotSV for annotation (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD COMMENT • link 5.8 years ago by LGMgeo ▴ 100

0

Entering edit mode

The link to AnnotSV seems to be broken. If you are an author on the paper, do you mind fixing the link please? www.lbgi.fr/AnnotSV/

ADD REPLY • link 5.4 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.5k

0

Entering edit mode

For 2 days, we experienced some technical problems with our network, so that you can not access the AnnotSV website.

I apologize for any inconvenience that may be caused by this temporary interruption.

ADD REPLY • link 5.4 years ago by LGMgeo ▴ 100