[bcftools] [linux cluster] subset vcf.gz file per snp keeping headers and columns
1
0
Entering edit mode
4.0 years ago
mgois • 0

Hi there!

I have a .vcf.gz file which I can only access via command line (on a linux cluster). The file is specific for chr 2 and it is quite big, so I don't know how many columns are there. I want to extract the all the columns information selected for one SNP, which I know the ID. I also need the output file to contain the same header and all columns from the original, but the info for only this snp (so I could run another code).

So far, the only thing that worked for selecting the snp (but doesn't keep the header or other columns) was:

bcftools query -i 'ID="snp id"' -f'[%SAMPLE\t%DS\t%REF\t%ALT\n]'  file.in.vcf.gz  > file.out.vcf.gz

I also tried:

bcftools view -i 'ID="snp id"'  <file.in.vcf.gz> -o <file.out.vcf.gz>

which returned the error:

-bash: syntax error near unexpected token `newline'

I also tried this one, with the same error:

bcftools query -i 'ID="snp id"' file.in.vcf.gz > file.out.vcf.gz

Hope you can help me figure this out. I am new in bcftools, but I also read the manual for this and couldn't find anything.

Thanks!

bcftools SNP variant calling software error • 1.7k views
ADD COMMENT
0
Entering edit mode

-bash: syntax error near unexpected token `newline'

it a problem with how you're invoking bcftools. There is something in the context we cannot see with the snippet you provided. UNLESS... are you really using the expression <file.in.vcf.gz> ?

ADD REPLY
0
Entering edit mode
4.0 years ago
arnstrm ★ 1.8k

If you have gzipped VCF file, you could run this simple bash command to get the SNP you want:

zcat input.vcf.gz | awk '(/#/ || $3=="snp-id")' > outputfile.vcf

here zcat will stream the extracted vcf file and awk will print the lines that have either # (header lines) or the SNP id (3rd column) containing exactly the "snp-id" you provide. I assume this is what you want to accomplish?

ADD COMMENT

Login before adding your answer.

Traffic: 1368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6