add original filename using vcf-concat
1
0
Entering edit mode
9.8 years ago
dolevrahat ▴ 30

Hello

I need to concatenate several hundreds of vcf files. When viewing the concatenated file I want to be able to know from which vcf file each row originated.

I thought about adding a column with the original file name but I'm:

  1. Not sure of its possible
  2. Never saw that it's been done.

I'm also rather new to working with vcf files so I don't want to reinvent the wheel, so I will really appreciate any advice to what is the standard way to do what I'm trying to accomplish.

Thanks in advance.

vcf • 1.7k views
ADD COMMENT
2
Entering edit mode
9.8 years ago

Using awk add a INFO definition in the header, and add the filename in the snp. The awk script should be something like (not tested):

/^#CHROM/ {
     printf("##INFO=\n");
     }

/^#/ { print; next;}

    {
    for(i=1;i<=NF;++i) { if(i>1) printf("\t"); if(i==8) printf("F=%s;",FILENAME); printf("%s",$i); }
    printf("\n");
    }

and then

awk -F '\t' -f script.awk in.vcf > out.vcf
ADD COMMENT
0
Entering edit mode

It worked. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6