Question: Extract SNPs from VCFfile located in genes based on GFF file information
I have a VCF file with SNPs and genes subset of GFF file (only genes are present). How to extract SNPs in VCF format located in genes from my data?

ADD COMMENTlink 18 months ago Denis • 70 • updated 18 months ago finswimmer 11k
Use bedtools:

$ bedtools intersect -a input.vcf -b genes.gff -header -wa > output.vcf


For (very) large vcf files it might be more efficient to bgzip and tabix index the vcf file, convert your gff to bed and use tabix to query the regions

1. bgzip and index

$ bgzip -c input.vcf > input.vcf.gz
$ tabix input.vcf.gz

2. gff to bed

E.g with BEDOPS:

$ gff2bed < genes.gff > genes.bed

3. Query the regions

$ tabix -R genes.bed -h input.vcf.gz > output.vcf

fin swimmer

ADD COMMENTlink 18 months ago finswimmer 11k

