Vcftools: Filtering By Multiple Regions (--Positions Flag?)
3
2
Entering edit mode
11.0 years ago
Matt W ▴ 250

Is there a way to filter multiple positions from my VCF files? I am trying to use vcftools which basically gives me two different options.

  1. --chr $(chrom) --from-bp $(start) --to-bp $(stop)

    The problem with this approach is I need multiple regions. So do I just reuse these flags multiple times? Specifically, there are 2192 regions I would like to extract.

  2. --positions pos.txt

    According to the docs, the input file requires a "chromosome and position", but I need multiple regions. This would work if I could specify regions.

Am I misinterpreting how to use these flags? Or is there an easier way to extract multiple regions from VCF files?

Thanks!

vcftools snps filtering • 16k views
ADD COMMENT
4
Entering edit mode
11.0 years ago

use bedtools instead of vcftools : see http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html with VCF/BED

ADD COMMENT
1
Entering edit mode

Code works for me :) bedtools v2.25.0

bedtools intersect -a myfile.vcf.gz -b myref.bed -header > output.vcf
ADD REPLY
0
Entering edit mode

Thank you very much, it works for me.

ADD REPLY
0
Entering edit mode

I don't actually have a second input file. I only have a list of regions that I would like to extract. Does bedtools support an input that isn't BED/GFF/VCF?

ADD REPLY
1
Entering edit mode

" I only have a list of regions": means you have a BED (chrom/chromStart/chromEnd) https://genome.ucsc.edu/FAQ/FAQformat.html#format1

ADD REPLY
0
Entering edit mode

Ah, silly question. Thanks for the reply. I should have read the docs before making an assumption about the format. Thanks!

ADD REPLY
1
Entering edit mode
11.0 years ago
Erik Garrison ★ 2.4k

vcfintersect in vcflib will do this.

vcfintersect -b regions.bed variants.vcf

You can also use another VCF file, but you'll need a reference (it checks the haplotypes to be sure that alleles are the same even if they are aligned differently).

vcfintersect -f ref.fa -i known.vcf new.vcf >results.vcf

Note that intersecting variants will remove alleles which don't overlap even if they are at the same position as variants which do. The records are all adjusted to reflect the fact that an allele has been removed to maintain semantic consistency in the file. Specifically, all Number=A and Number=G fields in INFO and in the sample fields are adjusted.

ADD COMMENT
0
Entering edit mode
11.0 years ago
Adam ★ 1.0k

You could use the --bed option in vcftools (or use bedtools as Pierre suggests).

ADD COMMENT

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6