get snps from specific individuals from VCF
0
0
Entering edit mode
2.9 years ago

Hi,

I have a VCF called by GATK with 150 samples, and have around 20k SNPs in total.

I want to filter the VCF, and only keep the 20 samples out of 150, and also only keep the SNPs for these 20 samples. This means I only want to extract the SNPs of 20 samples from a VCF called from 150 samples. I tried with VCF tools --keep, as well as Bcftools view -S, they both can exclude the other 130 samples from the VCF, but the number of SNPs remain exactly the same...

Is there any solution to exclude samples as well as the SNPs called from them?

Many thanks!

vcftools VCF GATK bcftools • 687 views
ADD COMMENT
2
Entering edit mode

If there are multiple sites with missing/unknown GT (./.) for the samples that remain, you may want to add an expression with with the bcftools view -S or pipe the output of the view -S to a view -e 'COUNT(GT="mis" =20)' to exclude sites where all 20 GTs are missing.

ADD REPLY

Login before adding your answer.

Traffic: 3045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6