Please can someone explain to me why this happens?
Here are the steps I take
I carry out a filter on max missing using:
vcftools --vcf filename.vcf --max-missing 0.8 --recode --recode-INFO-all --out miss_80
I then thin the vcf by using:
vcftools --vcf miss_80.recode.vcf --thin 250 --recode --recode-INFO-all --out miss_80_thin
This vcf has 2,573 SNPs in.
I then filtered again on -max-missing (as I am still having issues with PCA analysis) at 80% using:
vcftools --vcf miss_80_thin.recode.vcf --max-missing 0.8 --recode --recode-INFO-all --out redo_miss
This has further reduced the SNPs to 1,462
What I dont understand is: If I have already filtered for a missingness of 80% why do I still remove SNPs when I filter for the exact same proportion the second time around? Surely all the snps that have higher than 80% missingess have already been removed?!
Sorry if this is a basic question - I cant seem to get my head around what exactly is happening here. Thank you for your advice in advance! Happy Holidays to all :)