Annotating variants with Annovar, Oncotator and SnpEFF
1
2
Entering edit mode
9.5 years ago
ivivek_ngs ★ 5.2k

Dear All,

I have been annotating my exome variants mostly with Annovar but off late tuning my variant calling for stringency and more confident calls with VarScan2, I found that Annovar gives very less number of annotations of the variant positions to gene names. I was not sure if I was heading the correct way so I did try Oncotator and SnpEFF and found out the annotated genes for the variants are more than Annovar. Did anyone face any kind of such similar situations? I would like to have some kind of suggestions to use which annotation tool for variants among these three, is Annovar incomplete or having some shortcomings that am not aware of? I would appreciate if someone enlightens me on this front. The annotations did not increase to greater fold since I was only concerned with exonic level genes but still it increased to a level of 30% in some of my tumor/nomal samples. Is this a likely scenario then which is the gold standard annotation tool that is used? I am sure there is no such gold standard annotation tool as these 3 are most likely used but I would like to know what other users think and which annotation is being followed. For obvious reasons the annotation of Oncotator and SnpEFF are much more detailed and they can be used for other purpose but if you are just interested in annotating the exonic level variants then I am finding Oncotator and SNPEff giving better performance for my data than annovar. Any suggestions?

Thanks

sequencing SNP • 10k views
ADD COMMENT
3
Entering edit mode

Hello,

I suggest you to read this paperhttp://genomemedicine.com/content/6/3/26.

If I remember, Annovar reports only the most damaging annotation.

ADD REPLY
1
Entering edit mode

Thanks a lot arno.guille . This paper is great and clears a lot of my doubts, now I have some other queries that comes off my head after reading this paper. I would be happy if you have some inputs to give. Do you have idea what transcript sets Oncotator and SnpEFF uses? As from their manual I found they refer to the UCSC gene id and transcripts which means it should be mirroring either to RefSeq or ENSEMBLE transcript sets. I could not find out exactly which transcript set is being used for the annotations. Also do you know how I can annotate variants with ANNOVAR without default RefSeq transcript sets and in place of that I want to use ENSEMBLE transcript set. If you have any idea please share.

ADD REPLY
1
Entering edit mode

When you ask for a input, i guess you want a vcf file ? If this is the case, you can download a vcf file for this well-known sample (NA12878) from this link.

https://www.dropbox.com/s/6922bt78zmmfshf/NA12878_S1.vcf?dl=0

Oncotator uses UCSC (Refseq) but snpEff seems to use both refseq and ensembl.

Yes you can use the ensembl annotation with Annovar

Take a look here http://www.openbioinformatics.org/annovar/annovar_gene.html#knowngene

But if you prefer the ensembl annotation, maybe you should try VEP.

ADD REPLY
0
Entering edit mode

Thanks a lot arno.guille , I will try to do both VEP with ENSEMBLE and ANNOVAR with ENSEMBLE transcript and then compare my results. Yes Oncotator uses the RefSeq which I figured out last night and for SnpEff it uses both , which can again be constrained using canonical handler while trying to annotate, but anyways I will try to do VEP and compare my results. Thank you everyone for the suggestions.

ADD REPLY
0
Entering edit mode

Cool! Just make sure that ANNOVAR will have the same version of genes as Ensembl. With VEP you will be annotating your variants against the latest version of Ensembl genes (and therefore GENCODE)!

ADD REPLY
0
Entering edit mode

Just one piece of information, the ANNOVAR ensgene transcript file which I am using is last updated in Oct 2012, am using the ensgene.txt which contains the ensemble transcript for ANNOVAR downloaded from its website. It is the most updated one. For VEP it will be a more recent updated version right? Can you tell something on this front? Denise CS I feel the version of ENSEMBLE in ANNOVAR website and VEP is different.

ADD REPLY
1
Entering edit mode

Yes, if you use VEP, you will have a more updated version of our Ensembl genes. If you are working on the previous assembly of human (GRCh37) the latest annotation from Ensembl is dated from Sep 2013 (GENCODE v19). For the latest assembly (GRCh38), the most updated annotation in Ensembl is dated Aug 2014 (GENCODE v21). You can run Ensembl VEP for both old and latest assemblies.

ADD REPLY
0
Entering edit mode

I will be using the ENSEMBLE VEP for the first time. My human assembly is GRCh37 so I will be using GENCODE v19. Thanks a lot Denise CS . I will get back to you if I face any issue with VEP.

ADD REPLY
0
Entering edit mode

It should be easy! There is plenty of documentation out there in our website but give it a shout if you get stuck. It might be worth signing up to the Ensembl developers' mailing list too or sending an email to the Ensembl helpdesk.

ADD REPLY
1
Entering edit mode

Hi,

Denise CS

I was trying to use the online VEP from the http://grch37.ensembl.org/Homo_sapiens/Tools/VEP . Here I see transcript database to use where I can see options of both ENSEMBLE transcripts and GENCODE, since I select GRCh37 so the GENCODE should be v19. Can you please tell me which transcript should I select for annotating online? Will it be ENSEMBLE or GENCODE from the webpage?

ADD REPLY
1
Entering edit mode

Hello! The option for GENCODE you see in VEP is the the basic one. This means that you will have a subset of representatives from GENCODE consortium (which Ensembl is part of). The basic set contains full-length transcripts for every loci (unless a partial transcript is the only representative of that locus). So some annotation will be missing. The Ensembl set is equal the GENCODE comprehensive set: it will contain every single annotated transcript in the human (and mouse) genome. For the majority of our users, the basic set will suffice. If you compare the basic set versus the comprehensive one on the Ensembl browser for a APOH, the basic set will have one transcript whereas the comprehensive set will have that one plus 3 others (all protein coding). It's difficult for me to tell you which one to go for as this depends on what you are trying to find out but all I can say is that I'd rather annotate my variants in VEP against Ensembl...

ADD REPLY
1
Entering edit mode

Thanks a lot for the reply Denise CS . Makes sense. I will go for the comprehensive one which is the ENSEMBLE in the webpage as I am interested in all kinds of transcript sets involved and I want to compare those results with ANNOVAR annotation and Oncotator , I am sure ANNOVAR will give less annotations as compared to others since its algorithm at the end outputs the most damaging once. Anyway thanks a lot for the advice.

ADD REPLY
0
Entering edit mode

Denise CS

Can you tell me for the online version of VEP what is the limiting number of variants that should be there in a file?

ADD REPLY
0
Entering edit mode

The size limit for file upload into the online VEP is 5MB (>500K variants). If you register to Ensembl and are logged on, the limit goes up to 50MB.

ADD REPLY
0
Entering edit mode

Thanks for the reply. My files are way too low but it contains some additional information in other columns is that the reason that the online VEP is failing?

ADD REPLY
0
Entering edit mode

Which additional information are you talking about? The accepted files are described in our help pages. Are you using the main site or one of our mirrors? Perhaps you could send us a email and we will look into that more carefully? Make sure to include the ticket number for the VEP job.

ADD REPLY
0
Entering edit mode

I am providing the below format which should be the pileup format for the webpage

chr1 976506 A -GCGGGGGC
chr1 1141864 C +G
chr1 1417649 T -C
chr1 1454046 T -AC
chr1 2117710 A -C
chr1 2494360 C -T
chr1 2985791 A +G

This should work right? I deleted my previous two jobs already. Now I have this format removing all the additional columns and trying to run VEP with the format saying as pileup. Please let me know if this is ok as in the website this is recognized as pileup.

ADD REPLY
0
Entering edit mode

This format does not work, can anyone give a solution how to make this VEP compatible? I wrote a script which converts text to vcf4 format but that does not work here probably coz of the signs, so any heads up?

ADD REPLY
0
Entering edit mode

It should work indeed. We are having problems with the online VEP and the jobs are not running. We are looking into fixing this as soon as we can and we are very sorry for the inconvenience caused.

ADD REPLY
1
Entering edit mode

It's working now @vchris_ngs. Although VEP does accept pileup, this file format is now deprecated and there are no standard docs on it. So if you can, try to use VCF or Ensembl default instead.

ADD REPLY
0
Entering edit mode

Yes indeed. I got the results. Thanks a lot Denise CS

ADD REPLY
3
Entering edit mode
9.5 years ago
Denise CS ★ 5.2k

With the Ensembl VEP you can get all sorts of variant annotation (exonic, missense, regulatory, downstream) all based on the Sequence Ontology consequence terms. And it can do so against the Ensembl (which is the GENCODE v21) gene/transcript set and RefSeq set too.

ADD COMMENT

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6