vcftools: unwanted filtering
1
0
Entering edit mode
6.1 years ago
Molly_K ▴ 60

I am using vcftools to breakdown a large VCF file into smaller files using -

for i in `seq 1 22`; do vcftools --gzvcf ~/path_to_large.vcf.gz --chr "$i" --out ~/path_to_small_vcf --recode; done

This is the message I got after running this command (using chr22 as example)

VCFtools - 0.1.15 (C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf /path_to_large_vaf/large.vcf.gz
    --chr 22
    --out /path_to_small_vcr/
    --recode

Using zlib version: 1.2.8 After filtering, kept 1000 out of 1000 Individuals Outputting VCF file... After filtering, kept 72353 out of a possible 2825214 Sites Run Time = 987.00 seconds

I got the results that were split into different chromosomes but I noticed there are a huge number of variants got filtered out from the original 2825214 sites (only 72353 remained). I did not specify any filtering criteria in the command, what are the potential cause of this filtering process?

A little more about the vcf file used

fileformat=VCFv4.2

source=PLINKv1.90

SNP vcftools • 1.9k views
ADD COMMENT
1
Entering edit mode

Most programs will have default values for various program options (these would generally be listed in in-line help or in manuals). Perhaps one of the values is causing the filtering of the data here.

ADD REPLY
0
Entering edit mode

@genomax, thank you for the suggestion. I did read the vcftool manual but there isn't anything describing what are the default filtering values. I thought I was just breaking down the large file to smaller files, it shouldn't be doing any filtering. I read through the options carefully but didn't see anything. http://vcftools.sourceforge.net/man_latest.html

ADD REPLY
0
Entering edit mode

I only mentioned that since people tend to overlook that fact at times. Sounds like this observation still needs a logical explanation. Is there a log file (other than the message above) to check through to see why those SNP's were filtered?

ADD REPLY
0
Entering edit mode

Hi Molly_K,

No need to delete a question after you received (helpful) answers!

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

I basically realized it's a misunderstanding on my end so I deleted the question, realizing it isn't even a good question, but if someone has the same doubt when interpreting the results, this post may be helpful :P

ADD REPLY
1
Entering edit mode

That basically is the idea (for not deleting posts once they have received comments/answers).

ADD REPLY
1
Entering edit mode
6.1 years ago
Molly_K ▴ 60

I just realized that the larger number represents the SNPs that are not on the specified chromosome, they are not relevant lol.. thanks so much for thinking with me. I checked the output files (chr22) and compare with the original if I just use awk to get chr22, the row numbers are the same, ha. I will do it for a few other chromosomes too.

ADD COMMENT

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6