number of functions is more than namber of variants in snpEff's output
3
2
Entering edit mode
7.1 years ago
reza ▴ 300

hi everyone

i annotated my vcf file (resulted from samtools) using snpEff and output of snpEff confused me. in html output the "number of SNPs" is 2.5 million while "number of effects" is 3.8 million. this case about indels is higher even, number of indels is 350,256 while "number of effects" is 9 million. what happened? is it normal result? if i want that number of variant and "number of effects" be equal, what should i do?

thanks in advance

snp snpEff next-gen • 3.8k views
ADD COMMENT
1
Entering edit mode

Hi ,

What do you mean by number of function ?

ADD REPLY
0
Entering edit mode

it is a part of snpEff results in html format.

ADD REPLY
1
Entering edit mode

enlight us please and answer Titus's question: Please show us an example of such "number of functions". What does it mean ? GO Terms ?

ADD REPLY
0
Entering edit mode

i am so sorry, it is "number of effects" not "number of function". the results is like this:

Number of lines (input file) 2,559,765

Number of variants (before filter) 2,560,952

Number of not variants (i.e. reference equals alternative) 0

Number of variants processed (i.e. after filter and non-variants) 2,560,952

Number of known variants (i.e. non-empty ID) 0 ( 0% )

Number of multi-allelic VCF entries (i.e. more than two alleles) 1,187

Number of effects 3,891,852

ADD REPLY
1
Entering edit mode

If i understand the question is why there is more effect than variant ? isn't it ? The think is you have multi transcripts for an unique gene than you the all transcript concerned by the variant.

ADD REPLY
0
Entering edit mode

yes my question is what you said. there is any way that variants and effects be equal?

ADD REPLY
0
Entering edit mode

If i remember well there is no option for that kind of output. You could do that if you have a list of transcript ( see this page http://snpeff.sourceforge.net/SnpEff_manual.html ). The only condition you need is no overlapping over yours positions transcripts. An other way is to use VEP quit similar to snpEFF ( http://www.ensembl.org/info/docs/tools/vep/index.html ) and which output variant annotation per line for different transcripts.

ADD REPLY
1
Entering edit mode
6.9 years ago
reza ▴ 300

i used longest transcript per gene for building database in snpeff but my problem is still not solved (number of effects are more than variants). This problem has puzzled me greatly, can someone help me to solve it?

ADD COMMENT
0
Entering edit mode
5.9 years ago

Hello guys! was this problem resolved I am getting the same issue now.

ADD COMMENT
0
Entering edit mode

That is not a problem. That is annotation. For eg. DMD gene has 10 transcripts resulting in 10 isoforms. Any sequence variation in DMD gene will have 10 calculated functional consequences in total, one per isoform. Effect calculations consider all the transcripts of the gene.

ADD REPLY
0
Entering edit mode
5.9 years ago

Hello,

I guess this cannot be solved easily.

What was already mentioned is, that you have multiple transcripts per genes. SnpEff annotates each of them unless one uses the -canon option. Than SnpEff only uses the canonical transcript.

But that's still not enough. You have region where different genes overlap the same region. What should SnpEff do here? Which gene should it choose for annotation? And it's very likely that the effect in one gene is different than in the other.

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6