Forum:Create a File format to store Variant
5
2
Entering edit mode
5.8 years ago
sacha ★ 2.4k

Hi,

BED format or VCF format are not well designed to store hgvs notation and metadatas of a variant.

I would like to have a specific file format to store list of variant using hgvs notation with different kind of data and make it easier to share, import, export ...

For instance, here is how I will store one variant in a JSON specific format :

{
variants: [
{
  "chr": "chr3",
  "pos": "23424",
  "cdna": "c.324A>G",
  "protein": "p.(H234V)",
  "class": 4,
  "transcript": "NM_000249.3",
  "comments": "This is a test",
   "dbSNP":  "rs324234",
   "samples": [
     {
       "id" : "sampleID",
       "family_id": "famID",
       "comment": "this is a man",
       "phenotypes": "HPO:23424, HPO:234234",
       "date" : "234234234"
     } ]
    }
]
}

Then I can imagine a tool to manage this format with differents features:

  • Import / export ( bed, vcf , sql, nosql, tabular, csv ... )
  • Create static web page from a variant list ( see this )
  • Get statistics info from the command line
  • And many more ...

The format can be JSON based or maybe HDF5 based. The biom file format inspired me

Do you already know something similar ? Is it a good idea ? if yes, helps are welcome. You can suggest other required fields .

hgvs json variant • 2.4k views
ADD COMMENT
9
Entering edit mode
5.8 years ago

enter image description here

The VCF spec is capable of capturing extra metadata around variants, including gene information, see tools such as VEP or annovar. Granted it's not the most glamorous of implementations, but it works. To do this you'd have to have a tool that converts from VCF to your new format, and you'd have to show significant improvements over the base VCF format for people to even consider moving away. VCF is almost like the SAM spec, it's not ideal, but it's so ingrained in common practise that moving away from it and even tweaking the spec are huge jobs that have many downstream compatibility hurdles.

ADD COMMENT
0
Entering edit mode

:D I know this xkcd too ! That's why I am asking . I didn't find any standard which store variant in hierarchical structure. VCF or BED file doesn't store hgvs notation. Same for patient Ids or family Ids. So, it is 0 standard to 1 standard actually !

ADD REPLY
1
Entering edit mode

The VCF INFO field can have arbitrary keys, so it can store any additional data if you want it to.

For example, SnpEff adds a lot of information, all in the VCF format: http://snpeff.sourceforge.net/SnpEff_manual.html

ADD REPLY
0
Entering edit mode

Well, you can certainly have a multi-sample annotated VCF file, however you're correct in that sample wise meta data is typically captured by pedigree files.

ADD REPLY
0
Entering edit mode

My favourite XKCD comic.

ADD REPLY
2
Entering edit mode
5.8 years ago
steve ★ 3.5k

Another format I recently learned about is from the HL7 FHIR standard:

https://www.hl7.org/fhir/genomics.html

https://www.hl7.org/fhir/sequence-example-fda.json.html

ADD COMMENT
2
Entering edit mode
5.8 years ago
d-cameron ★ 2.9k

This question is actually very well timed. The GAG4H file formats working group has an upcoming teleconference on the future of VCF and a potential new file format. I strongly recommend getting involved if there are limitations in the existing file formats that make it unsuitable for your use case.

ADD COMMENT
0
Entering edit mode

Who are those convening the meeting? It should ideally be people who have frequently used the format, with representation from across the globe.

ADD REPLY
0
Entering edit mode

where and How to participate?

ADD REPLY
0
Entering edit mode
5.4 years ago
bdolin ▴ 90

Here is a link to the latest FHIR spec: http://build.fhir.org/ig/HL7/genomics-reporting/

And I also have a fairly simple mapping from VCF to this FHIR format if anyone is interested.

ADD COMMENT

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6