Vcf To Maf (Mutation Annotation Format) Conversion ?
2
4
Entering edit mode
12.3 years ago
Kasthuri ▴ 300

Is there any standard tool out there that can convert a VCF file to Mutation Annotation Format (MAF)?

Thanks -Kasthuri

vcf maf • 19k views
ADD COMMENT
1
Entering edit mode

See www.biostars.org/p/74822/ and seqanswers.com/forums/showthread.php?t=16740

ADD REPLY
0
Entering edit mode

I'm afraid your pointers are not useful here:

  • The biostars post was opened 2 years after this one and is marked as a possible duplicate of this question.
  • The SeqAnswers post is from the same username as this post, opened around the same time.
ADD REPLY
0
Entering edit mode

I have snpeff annotated vcf files and I am converting these to maf format. When I run vcf2maf I get the rerror

ERROR: Unrecognized effect "DOWNSTREAM". Please update your hashes!

Can you please point out the reason for this error.

ADD REPLY
0
Entering edit mode

Please open a new question, and use tags and keywords like vcf, maf, vcf2maf... so the relevant folks can find it.

ADD REPLY
4
Entering edit mode
10.4 years ago

I recently posted a VCF->MAF conversion script at: github.com/ckandoth/vcf2maf. It's plenty documented so that you understand what information is lost in translation.

Briefly - each VCF variant must be annotated to only one of all possible gene transcripts/isoforms that it might affect. This selection of a single affected transcript/isoform per variant, is often subjective. For now, the script tries to follow best-practices: it chooses the "worst" effect on the "best" transcript. If there are multiple such candidates, it annotates the variant effect on the canonical "best" transcript.

ADD COMMENT
0
Entering edit mode

That's a great tool, thanks! I added a command line parameter for the name of snpeff vcf, feel free to use it if interested. (https://github.com/dakl/vcf2maf)

ADD REPLY
1
Entering edit mode

@Cyriac Actually, I also removed the snpeff step completely, requiring that the user runs it separately upstream of vcf2maf. I think that makes more sense, so that vcf2maf.pl is a pure converter of a pre-annotated file. Whaddayathink?

ADD REPLY
1
Entering edit mode

Yea that makes sense - to give the user the option to run snpEff themselves. Actually, the first version of this script was a "converter of a pre-annotated VCF" :) Then I wanted to package it all-in-one.

Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.

ADD REPLY
0
Entering edit mode

FYI, I recently started getting ERROR: Unrecognized biotype "non_coding". Please update your hashes! at vcf2maf.pl line 287, <GEN0> line 171..I added it with priority 3 which had other non-coding RNAs in it. Just so you know.

Info on the biotype from here: http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html

ADD REPLY
1
Entering edit mode

Thanks. Which transcript database are you using? I don't see non_coding as a valid transcript biotype in the Ensembl 74 GTF, but I do see it listed in the GENCODE specs. I have now updated the script to handle all the GENCODE biotypes.

ADD REPLY
0
Entering edit mode

I've been using 73 so it's likely that's changing between versions, great to handle them all. What's the rationale in prioritizing the biotypes?

ADD REPLY
0
Entering edit mode

If a variant locus maps to multiple genes/transcripts, which biotype is most well defined and/or more likely to be disease associated.

ADD REPLY
0
Entering edit mode

This is a great tool but the current version still requires snpEff yet I have already annotated using snpEff. Could you please provide ASAP a version that doesn't require snpEff? Thanks!

ADD REPLY
2
Entering edit mode

Please see fork of the code mentioned above by @Danielk. Alternatively, my script skips snpEff annotation for an input VCF named file.vcf if it finds an annotated VCF in the same folder named file.anno.vcf.

Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.

ADD REPLY
0
Entering edit mode

To convert to MAF, you'll always have to annotate the variants with snpEff, no matter if it's done in the script as in Cyriacs version, or upstream as in my version. There's no way around that.

ADD REPLY
1
Entering edit mode
12.3 years ago

MAF contains annotation about the variant effects on transcripts/proteins while VCF typically does not. You might find that using tools like annovar, snpeff, and the Ensembl Variant Effect Predictor get you pretty close. I'm not aware of a script that applies one or more of the tools to a VCF file to produce MAF directly.

ADD COMMENT
1
Entering edit mode

I should comment here that MAF is not really considered a "standard" format, so you may want to make sure that the output of one of the software packages mentioned above would not suffice for your final purpose.

ADD REPLY
2
Entering edit mode

FWIW, MAF is a "standard" format within the TCGA project. Here's documentation: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification

ADD REPLY
0
Entering edit mode

Thanks! I tried annovar and snpeff and although they are close, they don't really help. Looks like I need to write my own script!

-K.

ADD REPLY
0
Entering edit mode

But you'll probably still need to run annovar or snpeff or something like that (unless you are into reinventing wheels). The output of annovar or snpeff is what gets fed to your script is what I would envision.

ADD REPLY
0
Entering edit mode

Thanks Sean. The problem started when I wanted to use MuSiC. This requires the mutations in MAF format and I have a bunch of vcfs. You are right, that I first need to extract information from the vcf through annovar/snpeff.

ADD REPLY
0
Entering edit mode

Thanks Sean. The problem started when I wanted to use MuSiC gmt.genome.wustl.edu/genome-music/0.2/index.html) This requires the mutations in MAF format and I have a bunch of vcfs. You are right, that I first need to extract information from the vcf through annovar/snpeff.

ADD REPLY
0
Entering edit mode

The MAF format specifically asks "Tumor_Seq_Allele2" in Column 13. And I am wondering how do I can find that information in the vcf file? Thanks.

ADD REPLY
0
Entering edit mode

If there are two variant alleles, then you will find that in the ALT column of the VCF file as a comma-separated value. In most cases, there will not be a second variant allele present, I do not think.

ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6