Filter max-missing multiple time removes more SNPs
0
0
Entering edit mode
3.3 years ago
QPaps04 ▴ 140

Please can someone explain to me why this happens?

Here are the steps I take

  1. I carry out a filter on max missing using:

    vcftools --vcf filename.vcf --max-missing 0.8 --recode --recode-INFO-all --out miss_80

  2. I then thin the vcf by using:

    vcftools --vcf miss_80.recode.vcf --thin 250 --recode --recode-INFO-all --out miss_80_thin

This vcf has 2,573 SNPs in.

  1. I then filtered again on -max-missing (as I am still having issues with PCA analysis) at 80% using:

    vcftools --vcf miss_80_thin.recode.vcf --max-missing 0.8 --recode --recode-INFO-all --out redo_miss

This has further reduced the SNPs to 1,462

What I dont understand is: If I have already filtered for a missingness of 80% why do I still remove SNPs when I filter for the exact same proportion the second time around? Surely all the snps that have higher than 80% missingess have already been removed?!

Sorry if this is a basic question - I cant seem to get my head around what exactly is happening here. Thank you for your advice in advance! Happy Holidays to all :)

SNP filtering vcftools linux sequencing • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 1528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6