How to merge VCF files for SNP's ?
1
1
Entering edit mode
8.5 years ago
bingnas ▴ 10

Hi all

I called six SNP's files as individual, I want to merge them such that considering the position and location. I want to do that for converting them as integer numbers 0,1,2.

The question is:

Could anyone please help me how I can merge them as following?

REF is hg19 , ALT1 is first patient, ALT2 second patient ... so on till ALT6 sixth patient.

#CHROM   POS     REF   ALT1   ALT2   ALT3   ALT4   ALT5   ALT6
chrM     3       T     C      G      A      C      T      C
chrM     4       C     A      C      T      A      G      C
chrM     150     T     C      T      C      C      G      A
chrM     195     C     T      C      T      C      A      T
chrM     410     A     T      T      C      C      T      C
chrM     711     G     A      C      T      T      G      G
chrM     1890    G     .      C      T      C      A      C
chrM     2354    C     T      T      C      A      G      C
chrM     2485    C     T      A      G      G      A      C,T
chrM     3457    T     C      G      A      G      A      C
chrM     4162    C     T      T      A      T      C,A    A
chrM     4217    T     C      G      T      A      G      T
chrM     4918    A     G      C      .      G      A      A
chrM     5581    C     T      G      A      A      G      .
chrM     8698    G     A      G      A      A      C      A
chrM     8702    G     A      G      C      G      C      A
chrM     9378    G     A      C      T      G      A      C
chrM     9541    C     T      C      T      C      T      C
chrM     10284   A     G      G      A      A      C      C
chrM     10399   G     A      G      A      A      G      T
chrM     10464   T     C      C      G      T      C      G
chrM     10820   G     A      G      T      .      C      A
chrM     10874   C     T      G      T      G      C,T    G
chrM     11018   C     T      C      T      A      C      C
chrM     11252   A     G      .      C      G      A      T
chrM     11723   C     T      .      A      C      T      T
chrM     11813   A     G      G      A      C      A      C

Is that possible? I wrote period because someone told me you should have these periods if the positions there!

Thank you in advance

Bing

SNP sequence next-gen-sequencing • 3.5k views
ADD COMMENT
1
Entering edit mode

If I understand correctly you want to recode SNPs from ACTG to 0,1,2 ?

You can use plink. First convert VCF into plink format, then run plink --recode12. If you are more comfortable working with vcf, you can convert it back to VCF again

ADD REPLY
0
Entering edit mode

Thank you stolarek for you answer, yes you got what i want. I will try

Bing

ADD REPLY
0
Entering edit mode

Hi ebrown1955,

Thank you very much for your a great answer, I would like to show you what I got from first command (CombineVariants):

#CHROM  POS  ID  REF  ALT  3395_167                       3395_1                         3395_341                        3395_343                       3395_49                        3395_60
chrM    3    .   T    C    ./.                            0/1:6:2,0,4,0:4:54,0,27:0      ./.                             ./.                            ./.                            ./.
chrM    4    .   C    A,G  ./.                            0/1:6:1,0,5,0:5:0              2/2:2:0,0,2,0:2:0               ./.                            ./.                            2/2:2:0,0,2,0:2:0
chrM    72   .   T    C    ./.                            ./.                            1/1:21:0,0,21,0:21:178,63,0:0   1/1:15:0,0,15,0:15:174,45,0:0  ./.                            ./.
chrM    73   .   G    A    ./.                            ./.                            1/1:21:0,0,21,0:21:179,63,0:0   1/1:15:0,0,15,0:15:173,45,0:0  ./.                            ./.
chrM    150  .   T    C    1/1:13:0,0,4,9:13:255,39,0:0   1/1:19:0,0,16,3:19:255,57,0:0  1/1:20:0,0,20,0:20:185,60,0:0   1/1:6:0,0,6,0:6:142,18,0:0     1/1:8:0,0,7,1:8:178,24,0:0     1/1:2:0,0,2,0:2:66,6,0:0
chrM    152  .   T    C    ./.                            ./.                            ./.                             ./.                            1/1:8:0,0,7,1:8:180,24,0:0     1/1:2:0,0,2,0:2:64,6,0:0
chrM    182  .   C    T    ./.                            ./.                            ./.                             ./.                            1/1:9:0,0,8,1:9:151,27,0:0     1/1:9:0,0,8,1:9:161,27,0:0
chrM    195  .   C    T    1/1:15:0,0,5,10:15:255,45,0:0  1/1:22:0,0,13,9:22:255,66,0:0  1/1:14:0,0,6,8:14:255,42,0:0    1/1:3:0,0,3,0:3:83,9,0:0       1/1:9:0,0,9,0:9:116,27,0:0     1/1:10:0,0,7,3:10:191,30,0:0
chrM    199  .   T    C    ./.                            ./.                            ./.                             ./.                            1/1:7:0,0,7,0:7:103,21,0:0     1/1:10:0,0,7,3:10:191,30,0:0
chrM    204  .   T    C    ./.                            ./.                            ./.                             ./.                            1/1:7:0,0,7,0:7:91,21,0:0      1/1:10:0,0,7,3:10:188,30,0:0
chrM    207  .   G    A    ./.                            ./.                            ./.                             ./.                            1/1:8:0,0,6,2:8:118,24,0:0     1/1:10:0,0,7,3:10:195,30,0:0
chrM    235  .   A    G    ./.                            ./.                            1/1:24:0,0,11,13:24:255,72,0:0  1/1:10:0,0,6,4:10:255,30,0:0   ./.                            ./.
chrM    250  .   T    C    ./.                            ./.                            ./.                             ./.                            1/1:19:0,0,6,13:19:255,57,0:0  1/1:18:0,0,15,3:18:243,54,0:0
chrM    410  .   A    T    1/1:13:0,0,3,10:13:255,39,0:0  1/1:27:0,0,19,8:27:255,81,0:0  1/1:26:0,0,19,7:26:255,78,0:0   1/1:12:0,0,11,1:12:226,36,0:0  1/1:5:0,0,2,3:5:148,15,0:0     1/1:6:0,0,3,3:6:119,18,0:0
ADD REPLY
0
Entering edit mode

and from second command (variantsToTable) is:

CHROM  POS  QUAL    1nt.GT  49nt2.GT  60nt3.GT  167nt4.GT  341nt5.GT  343nt6.GT
chrM   3    24.03   T/C     ./.       ./.       ./.        ./.        ./.
chrM   4    22.12   C/A     ./.       G/G       ./.        G/G        ./.
chrM   72   145     ./.     ./.       ./.       ./.        C/C        C/C
chrM   73   146     ./.     ./.       ./.       ./.        A/A        A/A
chrM   150  222     C/C     C/C       C/C       C/C        C/C        C/C
chrM   152  147.03  ./.     C/C       C/C       ./.        ./.        ./.
chrM   182  118.02  ./.     T/T       T/T       ./.        ./.        ./.
chrM   195  222     T/T     T/T       T/T       T/T        T/T        T/T
chrM   199  70.07   ./.     C/C       C/C       ./.        ./.        ./.
chrM   204  58.07   ./.     C/C       C/C       ./.        ./.        ./.
chrM   207  85.03   ./.     A/A       A/A       ./.        ./.        ./.
chrM   235  222     ./.     ./.       ./.       ./.        G/G        G/G
chrM   250  222     ./.     C/C       C/C       ./.        ./.        ./.
chrM   410  222     T/T     T/T       T/T       T/T        T/T        T/T

could you please tell me what I should do now? I would give Dominant Homozygous 2 and recessive Homozygous 0 and give Heterozygous 1.

Thank you

Bing

ADD REPLY
1
Entering edit mode

You could write a Python program to do this for you. You'll have to parse each line one by one separate each genotype by "/" and check to see if it's homozygous or heterozygous. I have a script that tells if a genotype includes the alternative allele and can be modified to do what you'd like it to do.

ADD REPLY
0
Entering edit mode

Thank you ebown1955 for your help

Yes please, I would like to see that code if you do not mind!

To be honest I am not familiar with bioinformatics, this is first time dealing with SNP's data, and would to convert the data to 0,1,2 and 5 that I can use Regression Analysis.

Bing

ADD REPLY
5
Entering edit mode
8.5 years ago
ebrown1955 ▴ 320

Assuming you have 6 VCF files, you can use GATK.

java -jar GenomeAnalysisTK.jar \
-T CombineVariants \
-R reference.fasta \
--variant input1.vcf \
--variant input2.vcf \
--variant input3.vcf \
--variant input4.vcf \
--variant input5.vcf \
--variant input6.vcf \
-o output.vcf \
-genotypeMergeOptions UNIQUIFY

Then you can use VariantsToTable to turn output.vcf into a table as requested:

java -jar GenomeAnalysisTK.jar \
-R reference.fasta -T VariantsToTable \
-V output.vcf \
-F CHROM -F POS -F ID -F QUAL -F -GF GT \
-o output.table
ADD COMMENT

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6