What are good settings to filter VCF files to merge SNP data with SNP array data
1
0
Entering edit mode
8.0 years ago
devenvyas ▴ 740

I have VCF data for >100 individuals published by a completely separate group. I have Plink SNP data from the Human Origins array, and I want to merge in these VCF data to the Plink data.

What would be good settings for filtering SNPs. The VCF files have already been filtered for DP >= 10 and GQ >=30. What other filtration settings should I use?

vcf SNP • 2.1k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
Brice Sarver ★ 3.8k

Those are good general purpose filters. Depending on how variants were called, you might have the option to filter on other fields depending on your needs (say, if the VCF was generated by calling haplotypes in the GATK or something). Hardy-Weinberg Equilibrium filtering is also helpful in identifying and removing potentially paralogous sites resulting from misplaced reads, but it's usually an additional analysis done after you have a multisample VCF.

ADD COMMENT
0
Entering edit mode

I just need to filter down to the SNPs with I have Plink data for and then to determine which of these calls are good enough to use. Orthologs/paralogs are not particularly relevant to to me. I am not looking at individuals genes, just ancestry and Neanderthal introgression.

ADD REPLY

Login before adding your answer.

Traffic: 2713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6