Vcf Statistics Given A Gff3 Annotation
1
2
Entering edit mode
10.2 years ago

Greetings,

Anyone know of a tool that summarizes where variants fall within a GFF3 file?

an ideal output would be something like:

number of variants within genes

number of variants within utrs

number of variants in CDS

...

Otherwise I can munge some code together. Thanks.

gff3 vcf annotation • 4.0k views
ADD COMMENT
3
Entering edit mode
10.2 years ago

snpEff is a variant effect annotator, which can build its gene transcript database from a GFF3 file using java -jar snpEff.jar build -gff3. See the section named "Building a database from GFF files" at this page. But make sure your GFF3 isn't already one of the pre-built databases listed by java -jar snpEff.jar databases. Run snpEff on the VCF using java -jar snpEff.jar eff to annotate each variant to all possible transcripts in your GFF3. Since you're not interested in variants flanking gene UTRs/CDS/introns, I recommend using the options -no-downstream -no-upstream. But look over the documentation using java -jar snpEff.jar -h to see all your options.

Once you have an annotated VCF, it should be easier to write a wrapper that counts the variants in UTRs, CDS, introns, etc. If you have variants than map to more than one gene/isoform in your GFF3.

ADD COMMENT

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6