Attaching genotypes in the vcf files to individual ID
1
1
Entering edit mode
6.0 years ago
Ana ▴ 200

I got a vcf file (header and individual ID have been removed form the vcf file), which contains only genotype information for 1000 positions, looks like this (for the fist position)

head all_unique.vcf

    5   49444858    rs12513792  C   A   100 PASS    AC=3353;AF=0.669529;AN=5008;NS=2504;DP=17854;EAS_AF=0.5625;AMR_AF=0.7075;AFR_AF=0.7489;EUR_AF=0.672;SAS_AF=0.6431;AA=.|||;VT=SNP    GT  0|1 1|0 0|1 1|0 0|1 0|1 1|1 1|0 1|0 1|1 1|1 1|0 0|1 1|1 1|1 0|1 1|1 0|1 0|1 0|1 0|0 0|0 0|1 0|1 1|1 0|1 1|0 1|1 1|0 1|0 1|0 0|0 1|1 1|0 1|1 1|0 1|1 1|1 1|1 1|1 1|1 0|0 1|1 0|1 0|1 1|1 1|1 1|0 1|1 1|1 0|1 1|1 1|1 1|1 1|1 1|0 1|1 0|1 0|0 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|0 1|0 1|1 0|1 0|1 1|1 1|0 1|0 1|0 0|0 0|1 1|0 0|1 1|1 0|0 1|0 1|0 0|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 1|0 0|1 1|1 0|1 0|1 0|1 0|0 1|1 0|1 1|1 0|1 0|1 1|1 0|1

I have another file which contains the individual ID:

head individual_id

HG00096
HG00097
HG00099
HG00100
HG00101
HG00102
HG00103
HG00105
HG00106
HG00107
HG00108
HG00109

the first individual in the ID file corresponds to the first genotype in the vcf file, and so on. For each position,I want to attach ID to each genotype and get an output and save the output in a seperate file like this :

head desired_output_pos49444858
    HG00096      0|1
    HG00097      1|0
    HG00099      0|1
    HG00100      1|0
    HG00101      0|1

Does anyone have an idea how can I make this file? Thanks

genotype • 1.4k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

assuming that the number of lines in individual_id is the same as the number of genotypes in the vcf...

   cat all_unique.vcf  | while read L ; do P=$(echo "$L" | cut -f2);paste <(cat individual_id ) <(echo "$L"| cut -f 10- | tr "\t" "\n" ) > "desired_output_pos${P}";done
ADD COMMENT
0
Entering edit mode

Yes, this works , the only problem is that I get a huge empty space after the final row. How can I get rid of this blank space? thanks for your help.

ADD REPLY
0
Entering edit mode

I get a huge empty space after the final row.

where ? which file ? do you have some empty lines in any of your files ?

ADD REPLY
0
Entering edit mode

Yes, I get my desired output. but in all of the output files that I get from running your code I get lots of empty lines!

ADD REPLY
0
Entering edit mode

it's because you have some trailing empty lines in individual_id or all_unique.vcf AND/OR some extra tabs after the last columns of all_unique.vcf

ADD REPLY
1
Entering edit mode

add grep -v -e '^[[:space:]]*$' to remove those empty lines

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6