Filter VCF by sample?
1
3
Entering edit mode
7.6 years ago
cmdcolin ★ 3.8k

I was trying to filter VCF files by sample using vcftools, and I'm testing on the 1000 genomes datasets

If I try to filter by CEU samples for example, I can try this

vcftools --gzvcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --recode --out CEU --keep CEU.tsv

Where CEU.tsv contains the sample IDs that are from the CEU population

The thing is that this appears to include variants where there are no variations in the kept samples. I tried settings --min-alleles also, but this didn't seem to fix it.

This operation is also pretty slow...any faster ways to do it?

vcftools 1000genomes • 4.8k views
ADD COMMENT
2
Entering edit mode
7.6 years ago
trausch ★ 1.9k

BCFtools should work

bcftools view --force-samples -o ceu.vcf.gz -O z --samples-file CEU.tsv --min-ac 1 input.vcf.gz

ADD COMMENT
0
Entering edit mode

Thanks again for this answer. Finding my own questions in a google search now 2 years later. Note that a relatively recent version of bcftools should be used e.g. the one from htslib simple because the options like --min-ac don't exist in the old 0.1.19 from the samtools package. If someone just wants a single sample you can just use bcftools view -s HG00096 --min-ac 1 100genomes.vcf.gz where --min-ac makes sure that there is at least 1 non-reference allele in the resulting output for that sample

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6