Biostar Beta. Not for public use.
Forum: Create a File format to store Variant
2
Entering edit mode

Hi,

BED format or VCF format are not well designed to store hgvs notation and metadatas of a variant.
I would like to have a specific file format to store list of variant using hgvs notation with different kind of data and make it easier to share, import, export ...

For instance, here is how I will store one variant in a JSON specific format :

{
variants: [
{
  "chr": "chr3",
  "pos": "23424",
  "cdna": "c.324A>G",
  "protein": "p.(H234V)",
  "class": 4,
  "transcript": "NM_000249.3",
  "comments": "This is a test",
   "dbSNP":  "rs324234",
   "samples": [
     {
       "id" : "sampleID",
       "family_id": "famID",
       "comment": "this is a man",
       "phenotypes": "HPO:23424, HPO:234234",
       "date" : "234234234"
     } ]
    }
]
}

Then I can imagine a tool to manage this format with differents features:

  • Import / export ( bed, vcf , sql, nosql, tabular, csv ... )
  • Create static web page from a variant list ( see this )
  • Get statistics info from the command line
  • And many more ...

The format can be JSON based or maybe HDF5 based. The biom file format inspired me

Do you already know something similar ? Is it a good idea ? if yes, helps are welcome. You can suggest other required fields .

ADD COMMENTlink 20 months ago sacha ♦ 1.7k • updated 15 months ago bdolin • 90
9
Entering edit mode

enter image description here

The VCF spec is capable of capturing extra metadata around variants, including gene information, see tools such as VEP or annovar. Granted it's not the most glamorous of implementations, but it works. To do this you'd have to have a tool that converts from VCF to your new format, and you'd have to show significant improvements over the base VCF format for people to even consider moving away. VCF is almost like the SAM spec, it's not ideal, but it's so ingrained in common practise that moving away from it and even tweaking the spec are huge jobs that have many downstream compatibility hurdles.

ADD COMMENTlink 20 months ago andrew.j.skelton73 5.7k
Entering edit mode
0

:D I know this xkcd too ! That's why I am asking . I didn't find any standard which store variant in hierarchical structure. VCF or BED file doesn't store hgvs notation. Same for patient Ids or family Ids. So, it is 0 standard to 1 standard actually !

ADD REPLYlink 20 months ago
sacha
♦ 1.7k
Entering edit mode
1

The VCF INFO field can have arbitrary keys, so it can store any additional data if you want it to.

For example, SnpEff adds a lot of information, all in the VCF format: http://snpeff.sourceforge.net/SnpEff_manual.html

ADD REPLYlink 20 months ago
igor
7.7k
Entering edit mode
0

Well, you can certainly have a multi-sample annotated VCF file, however you're correct in that sample wise meta data is typically captured by pedigree files.

ADD REPLYlink 20 months ago
andrew.j.skelton73
5.7k
Entering edit mode
0

My favourite XKCD comic.

ADD REPLYlink 20 months ago
Joe
12k
2
Entering edit mode
ADD COMMENTlink 20 months ago Pierre Lindenbaum 120k
2
Entering edit mode

Another format I recently learned about is from the HL7 FHIR standard:

https://www.hl7.org/fhir/genomics.html

https://www.hl7.org/fhir/sequence-example-fda.json.html

ADD COMMENTlink 20 months ago steve ♦ 2.0k
2
Entering edit mode

This question is actually very well timed. The GAG4H file formats working group has an upcoming teleconference on the future of VCF and a potential new file format. I strongly recommend getting involved if there are limitations in the existing file formats that make it unsuitable for your use case.

ADD COMMENTlink 20 months ago d-cameron ♦ 2.0k
Entering edit mode
0

Who are those convening the meeting? It should ideally be people who have frequently used the format, with representation from across the globe.

ADD REPLYlink 20 months ago
Kevin Blighe
43k
Entering edit mode
0

where and How to participate?

ADD REPLYlink 20 months ago
sacha
♦ 1.7k
0
Entering edit mode

Here is a link to the latest FHIR spec: http://build.fhir.org/ig/HL7/genomics-reporting/

And I also have a fairly simple mapping from VCF to this FHIR format if anyone is interested.

ADD COMMENTlink 15 months ago bdolin • 90

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0