Entering edit mode
5.5 years ago
misterie
▴
110
Hi,
I have 30 annotated VCF files (for each chromosome) in snpEff. I would like to do summary statistics and compare number of differents effect between chromosomes (or between autosomes and sex chromosomes). I have extracted ANN field from my VCF file (using short BASH script). And it is looks like this:
Chr 10
3_prime_UTR_variant 8691
5_prime_UTR_premature_start_codon_gain_variant 517
5_prime_UTR_variant 2904
bidirectional_gene_fusion 2
conservative_inframe_deletion 17
conservative_inframe_insertion 27
conservative_inframe_insertion&splice_region_variant 1
disruptive_inframe_deletion 55
disruptive_inframe_insertion 27
non_coding_transcript_exon_variant 928
non_coding_transcript_variant 113
splice_acceptor_variant&conservative_inframe_deletion&splice_region_variant&intron_variant 2
splice_acceptor_variant&disruptive_inframe_deletion&splice_region_variant&intron_variant 2
...
start_lost 9
stop_gained&conservative_inframe_insertion 1
stop_gained&disruptive_inframe_deletion 1
stop_gained&disruptive_inframe_insertion 1
stop_gained 108
stop_lost 15
stop_lost&splice_region_variant 3
stop_retained_variant 9
synonymous_variant 5038
upstream_gene_variant 98805
There are many types of variants (and also many single variants). I would like to group that variants and compare numbers of coding, introns, flanking sequences etc.
How can I do it? How can I group other variants.
Thank you in advance.