Do People Import VCF Files Into Databases? (2016 version)
2
1
Entering edit mode
7.6 years ago

This is a duplicate of:

  1. Do People Import Vcf Files Into Databases? from 5.4 years ago and
  2. Which Type Of Database Systems Are More Appropriate For Storing Information Extracted From Vcf Files from 3.5 years ago.

Nevertheless, I found the answers on the second post enlightening, and 3.5 years later it seems worthwhile to have an update from the community (and @lh3) on how you deal with these issues.

Notably, @Aaronquinlan's gemini from the second most upvoted answer to question #2 is still under active development. Are people using it? What else are people doing?

vcf database ngs genomics • 1.6k views
ADD COMMENT
0
Entering edit mode

If I need to query the contents in the VCF, especially large ones like ExAC, I normally use HDF5.

ADD REPLY
3
Entering edit mode
7.6 years ago

This is called "variant warehousing" and there are several open source and commercial efforts in various stages:

Golden Helix VSWarehouse
Paradigm4 SciDB
WuXi NextCODE
CMH Variant Warehouse
ViaGenetics Genesis
Curoverse Lightning
Intel GenomicsDB
Cloudera OMICS
ADD COMMENT
1
Entering edit mode
7.6 years ago
rbagnall ★ 1.8k

PlinkSeq can read vcf files into a database format, then generate summary stats and extract gene, individual or cohort level data. There is an online tutorial that is helpful.

ADD COMMENT

Login before adding your answer.

Traffic: 2911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6