How to count variant occurrences in .vcf
1
0
Entering edit mode
5.2 years ago
darceyc17 • 0

Hello Biostars,

I have done targeted NGS in to discover novel variants associated with a trait of interest. I am currently trying to prioritize those variants from my created .vcf to then genotype them in a larger population. Part of our variant prioritization is determining how many of the subjects, out of the 183 total, have the variant of interest. I am wondering if anyone would know how to go about this, without having to hand count each GT field, or had any suggestions.

Thank you!

vcf Variant SNP NGS • 2.1k views
ADD COMMENT
0
Entering edit mode

Does it matter if the genotype is homozygous or heterozygous? Or is the question just "how many sample have at least one allele with this variant"?

ADD REPLY
0
Entering edit mode

It is how many samples have at least one allele with an individual variant, and I have 7,531 variants I need to determine this for.

ADD REPLY
0
Entering edit mode

for a start:

 bcftools view input.vcf.gz "chrxxx:12345:12345" |cut -f 10- | tr "\t" "\n" | cut -d ':' -f 1 | sort | uniq -c
ADD REPLY
1
Entering edit mode
5.2 years ago

If you mean one variant and you're not planning to check your VCF any further, you might want to convert your VCF to a TSV to be able to count more easily.

This tool is available in Galaxy: NGS: VCF Manipulation VCFtoTab-delimited: Convert VCF data into TAB-delimited format

Otherwise, I like the tool vt for working with summarizing vcf statistics.

If just looking at one variant, you can probably use a BED file to specify and extract that exact variant, or even a genome browser such as IGV or JBrowse.

ADD COMMENT

Login before adding your answer.

Traffic: 3615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6