Where can I get exome vcf file from the 1000 genome project?
1
1
Entering edit mode
8.1 years ago

Hi, I'm trying to use 1000 genome data as control data for my analysis. I am interested in exome data and where can I get exome vcf data from 1000 genome project? I could find just whole genome data.

Does 1000 genome project provide exome vcf file? Or can I just restrict the target region from whole genome vcf to get the exome data?

1000genome exome • 4.2k views
ADD COMMENT
1
Entering edit mode
8.1 years ago

Please, look there --> Frequency of Exome data from 1000 Genomes Project

ADD COMMENT
0
Entering edit mode

Thanks. What I needed is individual level exome data. Using that file, I could extract exome data from whole genome data.

But I have one question. As I understand, whole exome and genome should undergo different calling process. Simply extracting some regions (target exon) from genome data can be considered as exome data?

ADD REPLY
1
Entering edit mode

Hi Kelly,

I came across this question because I wanted exome VCF files. In the end I couldn't find any and I downloaded the FASTQ data and called variants myself. I also subsetted the target exon regions from the WGS data. As you surmised, it is quite different from calling variants from exome data. I wrote this work up on my blog https://davetang.org/muse/2017/02/14/a-single-exome/ if you are still interested.

Cheers,

Dave

ADD REPLY
0
Entering edit mode

I guess, no. I prefer to consider exome data as ones that obtained from Whole Exome Sequencing technology (WES) (library preparation includes enrichment for exon targets). Target exon regions (extracted from WGS data) may not be properly covered and if you plan to use them further in variant calling analysis, you can expect some false-postive and false-negative results.

If you are interested in covering only variants in exons and not in non-coding regions, it's better to use WES data.

1) WES shows high coverage towards the target exon regions. 2) There will always be regions that are not covered sufficiently by WGS , e.g. for variant calling. WGS has its value in identifying variants in regions that are not covered by exome enrichment technologies. These can be regions where enrichment fails, non-coding regions as well as regions that are not present on the current exome designs.

You can find some info about WES and WGS experiments, as well as different enrichment platforms in this paper:

Clark M. J., et al. Performance comparison of exome DNA sequencing technologies. Nature biotechnology 2011; 29(10):908-914.

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6