snpEff output gets zero results for synonymous SNPs
0
0
Entering edit mode
4.9 years ago
felipead66 ▴ 110

I have a vcf file after using snpEff annotation. I want to extract information from this file so I search for synonymous (and non-synonymous) sites. But when I do

grep -c "SYNONYMOUS" snpEffoutput.vcf

I get 0 results.

Does this mean there is something wrong with my file?

snpEff synonymous • 2.0k views
ADD COMMENT
0
Entering edit mode

Can you show the command line you use , and part of your output file snpEffoutput.vcf ?

Best

ADD REPLY
0
Entering edit mode

Try: grep -c 'synonymous_variant' snpEffoutput.vcf?

ADD REPLY
0
Entering edit mode

Yes, this gives me 81949 results. But again, when I do

grep -c 'non_synonymous' snpEffoutput.vcf I still get 0 results.

ADD REPLY
0
Entering edit mode

You can add -csvStats when running snpEff: java -jar snpEff.jar eff -csvStats snpEffoutput.csv snpEff_database snpEffinput.vcf > snpEffoutput.vcf. There will be a section counting each effect in the csv:

$ grep -A24 "# Count by effects" snpEffoutput.csv
# Count by effects

Type , Count , Percent
3_prime_UTR_variant , 92 , 0.147019%
5_prime_UTR_premature_start_codon_gain_variant , 9 , 0.014382%
5_prime_UTR_variant , 54 , 0.086294%
conservative_inframe_deletion , 9 , 0.014382%
conservative_inframe_insertion , 13 , 0.020774%
disruptive_inframe_deletion , 13 , 0.020774%
disruptive_inframe_insertion , 7 , 0.011186%
downstream_gene_variant , 15885 , 25.384726%
frameshift_variant , 167 , 0.266871%
initiator_codon_variant , 1 , 0.001598%
intergenic_region , 22782 , 36.406347%
intron_variant , 5008 , 8.00294%
missense_variant , 1667 , 2.663918%
splice_acceptor_variant , 12 , 0.019176%
splice_donor_variant , 8 , 0.012784%
splice_region_variant , 163 , 0.260479%
start_lost , 4 , 0.006392%
stop_gained , 28 , 0.044745%
stop_lost , 10 , 0.01598%
stop_retained_variant , 2 , 0.003196%
synonymous_variant , 1192 , 1.904853%
upstream_gene_variant , 15451 , 24.69118%
ADD REPLY
0
Entering edit mode

Thank you very much, that is every helpful. But, still, how do I get the non_synonymous?

Furthermore, I assume that the synonymous_variant are in the coding region?

ADD REPLY
0
Entering edit mode

Hi felipead66,

Please check out the "Effect prediction details" section on Input & output files from snpEff document. Starting from version 4.0 VCF output uses SO terms by default, so the classic "NON_SYNONYMOUS_CODING" is now "missense_variant", "initiator_codon_variant", and "stop_retained_variant". If you add -classic when running snpEff, you can still count them by grep -c 'NON_SYNONYMOUS'.

Hope it helps.

ADD REPLY
0
Entering edit mode

You mean the command line to create the snpEffoutput.vcf file?

ADD REPLY
0
Entering edit mode

Yes but as @SMK said you can do grep -c 'synonymous_variant' snpEffoutput.vcf

ADD REPLY

Login before adding your answer.

Traffic: 2845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6