I am using vcftools to breakdown a large VCF file into smaller files using -
for i in `seq 1 22`; do vcftools --gzvcf ~/path_to_large.vcf.gz --chr "$i" --out ~/path_to_small_vcf --recode; done
This is the message I got after running this command (using chr22 as example)
VCFtools - 0.1.15 (C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf /path_to_large_vaf/large.vcf.gz
--chr 22
--out /path_to_small_vcr/
--recode
Using zlib version: 1.2.8 After filtering, kept 1000 out of 1000 Individuals Outputting VCF file... After filtering, kept 72353 out of a possible 2825214 Sites Run Time = 987.00 seconds
I got the results that were split into different chromosomes but I noticed there are a huge number of variants got filtered out from the original 2825214 sites (only 72353 remained). I did not specify any filtering criteria in the command, what are the potential cause of this filtering process?
A little more about the vcf file used
fileformat=VCFv4.2
source=PLINKv1.90
Most programs will have default values for various program options (these would generally be listed in in-line help or in manuals). Perhaps one of the values is causing the filtering of the data here.
@genomax, thank you for the suggestion. I did read the vcftool manual but there isn't anything describing what are the default filtering values. I thought I was just breaking down the large file to smaller files, it shouldn't be doing any filtering. I read through the options carefully but didn't see anything. http://vcftools.sourceforge.net/man_latest.html
I only mentioned that since people tend to overlook that fact at times. Sounds like this observation still needs a logical explanation. Is there a log file (other than the message above) to check through to see why those SNP's were filtered?
Hi Molly_K,
No need to delete a question after you received (helpful) answers!
Cheers,
Wouter
I basically realized it's a misunderstanding on my end so I deleted the question, realizing it isn't even a good question, but if someone has the same doubt when interpreting the results, this post may be helpful :P
That basically is the idea (for not deleting posts once they have received comments/answers).