In 1000Genomes Data, Some Cnvs Are In A Position Higher Than The Chromosome Length
2
3
Entering edit mode
12.7 years ago
Yh Path ▴ 30

Hi, I am doing CNV detection based on read depth and wanted to compare my result with the golden set published by the 1000GP group. So, I downloaded their results from the link : ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/paper_data_sets/companion_papers/mapping_structural_variation/

I am very puzzled and hope somebody can help me. According to the reference genome,

awk 'NF>2' humang1kv37.fasta

...

19 dna:chromosome chromosome:GRCh37:19:1:59128983:1

...

chr19 runs from 1 to 59128983. But, CNVs on chr19 in their results go beyond 59128983

awk '{if ($1==19) print $1"\t"$2}' union.2010_06.deletions.sites.vcf

...

19 63742587

19 63788277

where $2 is the start positions of CNV.

Please, can somebody enlighten me where I did wrong ?

Many thanks! Yh

sequence cnv position reference genome • 3.0k views
ADD COMMENT
9
Entering edit mode
12.7 years ago

That appears to be a reference genome (hg18/hg19) error on your part. In the previous build, hg18, chr19 had a length of 63811651, which would have included those regions that you're finding. The newer build has a shorter chr19. I'm guessing that they're using hg18 (build 36), while you're comparing it to hg19 (build 37).

You can use the UCSC liftover tools to try to do conversion, but the results are sometimes a little sketchy.

ADD COMMENT
1
Entering edit mode

please do be careful when lifting over any variation as underlaying sequence and structure changes between two assemblies can mean thats straight coordinate mapping may be inaccurate

ADD REPLY
0
Entering edit mode

That seems to be the problem. Thank you very much for mentioning the liftover tool! The UCSC website is a treasure. ... YH

ADD REPLY
2
Entering edit mode
12.7 years ago
Laura ★ 1.8k

The assembly a vcf references should always be in the header of the vcf file

In this case

reference=1000GenomesPilot-NCBI36

In future vcf files will also start to contain sequence tags and md5s in the same way as bam files do so you can be certain you are comparing like with like

ADD COMMENT

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6