Question

Mutation count in multi-sample vcf file

0

Entering edit mode

5.5 years ago

LimMo ▴ 30

Hello all,

I have a one vcf file of 26 samples, what I need is to annotate this file and count the number of mutations for each gene for each sample of those 26.

I tried to use Annovar, but it seems that it doesn't support multi-sample vcf file.

Any suggestion or help, please?

VCF Annotate Mutations Count • 3.8k views

ADD COMMENT • link updated 2.7 years ago by YL ▴ 10 • written 5.5 years ago by LimMo ▴ 30

score 0 · Answer 1 · 2018-10-11

0

Entering edit mode

5.5 years ago

Pierre Lindenbaum 161k

I wrote groupbygene: http://lindenb.github.io/jvarkit/GroupByGene.html

your vcf must be annotated with VEP or SNPEFF.

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar |\
head | column  -t
#chrom  min.POS    max.POS    gene.name  gene.type         samples.affected  count.variations  M10475  M10478  M10500  M128215
chr10   52004315   52004315   ASAH2      snpeff-gene-name  2                 1                 0       0       1       1
chr10   52004315   52004315   ASAH2      vep-gene-name     2                 1                 0       0       1       1
chr10   52497529   52497529   ASAH2B     snpeff-gene-name  2                 1                 0       1       1       0
chr10   52497529   52497529   ASAH2B     vep-gene-name     2                 1                 0       1       1       0
chr10   48003992   48003992   ASAH2C     snpeff-gene-name  3                 1                 1       1       1       0
chr10   48003992   48003992   ASAH2C     vep-gene-name     3                 1                 1       1       1       0
chr10   126678092  126678092  CTBP2      snpeff-gene-name  1                 1                 0       0       0       1
chr10   126678092  126678092  CTBP2      vep-gene-name     1                 1                 0       0       0       1
chr10   135336656  135369532  CYP2E1     snpeff-gene-name  3                 2                 0       2       1       1

ADD COMMENT • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Just to make sure, here "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" it should be my vcf file after annotated by VEP or SnpEff, right?

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

1

Entering edit mode

it should be my vcf file after annotated by VEP or SnpEff,

yes

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Could you please help me with the annotation, I'm new to that field. Are there clear steps for the annotation?

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

1

Entering edit mode

see snpeff: http://snpeff.sourceforge.net/SnpEff_manual.html

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

After annotating with SnpEff and running GroupByGene, I got this error:

[SEVERE][GroupByGene]Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
    at com.github.lindenb.jvarkit.util.vcf.VCFUtils.parseHeader(VCFUtils.java:184)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.read(GroupByGene.java:379)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.doWork(GroupByGene.java:694)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.main(GroupByGene.java:710)
[INFO][Launcher]groupbygene Exited with failure (-1)

What's could be the problem?

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

0

Entering edit mode

Thanks for the help, I try to run GroupByGene using the same command and file just to see the output but it gives this error. Also when I tried my file (I was concerning if the problem from my file but that also happen for the provided file):

[SEVERE][GroupByGene]Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
    at com.github.lindenb.jvarkit.util.vcf.VCFUtils.parseHeader(VCFUtils.java:184)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.read(GroupByGene.java:379)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.doWork(GroupByGene.java:694)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.main(GroupByGene.java:710)
[INFO][Launcher]groupbygene Exited with failure (-1)

Any suggestion or help, please?

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

0

Entering edit mode

what was the input file ? how is it formatted ? what is the output of grep -m1 CHROM your_input.vcf ?

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

The input file was the one after annotated using SnpEff, This is what I got for grep -m1 CHROM myfile.ann.vcf :

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  LIBRARY_NAME=S2000  2:LIBRARY_NAME=S2000    3:LIBRARY_NAME=S2000    4:LIBRARY_NAME=S2000    5:LIBRARY_NAME=S2000    6:LIBRARY_NAME=S2000    7:LIBRARY_NAME=S2000    8:LIBRARY_NAME=S2000    9:LIBRARY_NAME=S2000    10:LIBRARY_NAME=S2000   11:LIBRARY_NAME=S2000   12:LIBRARY_NAME=S2000   13:LIBRARY_NAME=S2000   14:LIBRARY_NAME=S2000   15:LIBRARY_NAME=S2000   16:LIBRARY_NAME=S2000   17:LIBRARY_NAME=S2000   18:LIBRARY_NAME=S2000   19:LIBRARY_NAME=S2000   20:LIBRARY_NAME=S2000   21:LIBRARY_NAME=S2000   22:LIBRARY_NAME=S2000   23:LIBRARY_NAME=S2000   24:LIBRARY_NAME=S2000   25:LIBRARY_NAME=S2000   26:LIBRARY_NAME=S2000

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

1

Entering edit mode

please post a fragment of your vcf, including the header in something like https://gist.github.com/ or whatever...

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

See here: /................./

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

1

Entering edit mode

this works on my machine:

 java -jar dist/groupbygene.jar your.vcf

#chrom  min.POS max.POS gene.name   gene.type   samples.affected    count.variations    10:LIBRARY_NAME=S2000   11:LIBRARY_NAME=S2000   12:LIBRARY_NAME=S2000   13:LIBRARY_NAME=S2000   14:LIBRARY_NAME=S2000   15:LIBRARY_NAME=S2000   16:LIBRARY_NAME=S2000   17:LIBRARY_NAME=S2000   18:LIBRARY_NAME=S2000   19:LIBRARY_NAME=S2000   20:LIBRARY_NAME=S2000   21:LIBRARY_NAME=S2000   22:LIBRARY_NAME=S2000   23:LIBRARY_NAME=S2000   24:LIBRARY_NAME=S2000   25:LIBRARY_NAME=S2000   26:LIBRARY_NAME=S2000   2:LIBRARY_NAME=S2000    3:LIBRARY_NAME=S2000    4:LIBRARY_NAME=S2000    5:LIBRARY_NAME=S2000    6:LIBRARY_NAME=S2000    7:LIBRARY_NAME=S2000    8:LIBRARY_NAME=S2000    9:LIBRARY_NAME=S2000    LIBRARY_NAME=S2000
1   10402   10440   DDX11L1 ann_gene_name   2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENSG00000223972 ann_gene_id 2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENSG00000227232 ann_gene_id 2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000423562 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000438504 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000450305 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000456328 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000488147 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000515242 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000518655 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000538476 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000541675 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   WASH7P  ann_gene_name   2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This is more logical, I'm thinking that the command you provided not correct:

file.vcf |\
java -jar dist/groupbygene.jar

This is what I have been tried and give me the error.

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

0

Entering edit mode

file.vcf |\
java -jar dist/groupbygene.jar

I didn't wrote this.

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sorry but that what I understand from your post above with:

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar

ADD REPLY • link 5.5 years ago by LimMo ▴ 30

0

Entering edit mode

Hello Pierre,

I really appreciate this tool and it can be very useful for my project.

I wonder how it handles heterozygous vs homozygous variants. Will homozygous variants be counted as 2 variations with your tool?

Yiming

ADD REPLY • link 2.7 years ago by YL ▴ 10