how to use unix tools to convert VCF genotypes like '1|1' to this '2'
1
0
Entering edit mode
3.2 years ago
curious ▴ 750

I want every row to be an id and every column a genotype for a different sample:

id1 0 2 1
id2 0 2 1

I know I can do this in python, but trying to get better with unix because the tools are usually faster

bcftools query -f '%ID[\t%GT]\n'  my_vcf.vcf |  awk -F "|" '{for(i=1; i<=NF; i++) { print $i+$i }}'

I think this is almost there by I almost there but obviously I'm off. Any hints are greatly appreciated thank you.

bcftools unix • 905 views
ADD COMMENT
0
Entering edit mode

How is your output look like? I think you should have two for loops, the first loops the samples, and for each sample loop the GT

ADD REPLY
0
Entering edit mode

just this command bcftools query -f '%ID[\t%GT]\n' my_vcf.vcf gives me this:

id1 0|0 1|1 1|0
id2 0|0 1|1 1|0

I hope to get this with awk or similar:

id1 0 2 1
id2 0 2 1

right now i just get this

2
0
0
ADD REPLY
3
Entering edit mode
3.1 years ago
Zhilong Jia ★ 2.2k

cat 1.vcf

id1 0|0 1|1 1|0  
id2 0|0 1|1 1|0

sed -e 's/0|0/0/' -e 's/1|1/2/' -e 's/1|0/1/' -e 's/0|1/1/' 1.vcf

id1 0 2 1
id2 0 2 1
ADD COMMENT

Login before adding your answer.

Traffic: 2759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6