vcftools separate only selected chrom allocation from 4gb vcf file
2
0
Entering edit mode
4.3 years ago

Hi, I have a 4 GB *.vcf file and would like to filter only the chrome allocations that I need and write to a new file.

for example this one :

 17    7571720    7590868c
 3    10141635    10153670

i saved it to *.bed file and try it this command:

vcftools --gzvcf /home/user/Documents/*.vcf --bed /home/user/Documents/list.bed --out /home/sentinel/Documents/test

return: -> No data left for analysis!

VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--gzvcf /home/user/Documents/*.vcf
--out /home/user/Documents/test
--recode
--bed list.bed

Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=FT,Number=1,Type=String,Description="Sample filter, 'PASS' indicates that all filters have passed for this sample">
Warning: Expected at least 2 parts in FORMAT entry: ID=DPI,Number=1,Type=Integer,Description="Read depth associated with indel, taken from the site preceding the indel">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
After filtering, kept 1 out of 1 Individuals
Outputting VCF file...
Read 2 BED file entries.
After filtering, kept 0 out of a possible 41203829 Sites
No data left for analysis!
Run Time = 40.00 seconds

any ideas ?

filter allocation vcf vcftools • 1.6k views
ADD COMMENT
0
Entering edit mode

use bedtools intersect (https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html). But make sure that your VCF is formatted well @ dev.info.2021

ADD REPLY
2
Entering edit mode
4.3 years ago

it's difficult to say anything without knowing the content of your VCF file, but here are a couple of suggestions:

  1. have you checked your VCF and your BED files refer to the same reference? one quick dirty check is making sure that you're either using or not using in both files the "chr" prefix, although proper check would be to get the reference information from the VCF header and build the BED file with the corresponding reference positions.

  2. you're using the --gzvcf option with a plain VCF file. I don't know if vcftools is able to handle plain text files with the --gzvcf option, but an easy check would be to use the simple --vcf option

as an additional suggestion, I'd consider using bcftools. it's definitely much faster than vcftools, and if your VCF file is 4GB it'll definitely make a difference. these would be the commands to use, considering that you're aming to all VCF present (*.vcf) and assuming that those VCF files are not compressed:

for file in *.vcf; do
  bgzip -f $file; tabix -fp vcf $file.gz
  bcftools view -R file.bed $file.gz > ${file/.vcf}.filtered.vcf
done
ADD COMMENT
0
Entering edit mode
4.3 years ago

already found solution:

1] my bed file missed "chr"

 chr17    7571720    7590868c
 chr3    10141635    10153670

2] command missed --recode in output

vcftools --gzvcf /home/user/Documents/*.vcf --bed /home/user/Documents/list.bed --out /home/sentinel/Documents/test.vcf --recode

work perfectly.

ADD COMMENT

Login before adding your answer.

Traffic: 2387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6