Hi Everyone,
I have a VCF file of SNP,s that I called using GATK and now I need to make a multi alignment with this SNP's to do the phylogenetic analysis....
In this I can see for all samples in the Genotype section it is 1/1 or 0/1 or 0/0. I understand 0 is same as reference and 1 is the alternate.
My question is HOW can I make a multi alignment from having 2 information for a particular sample? In the sample given below the first one is chr1 824523 T C some missing reads then 1/1 0/0 . if it is 0/1 as seen in the second line of the sample I have given what will be my SNP. should I take the alternate or reference???
Thanks in advance.
This is how my VCF file looks:
chr1 824523 rs79497635 T C 15.41 LowQual AC=4;AF=0.500;AN=8;BaseQRankSum=1.026;DB;DP=15;Dels=0.00;FS=2.632;HaplotypeScore=0.2483;MLEAC=4;MLEAF=0.500;MQ=14.52;MQ0=4;MQRankSum=0.000;QD=7.71;ReadPosRankSum=-1.026;SB=-1.728e+01 GT:AD:DP:GQ:PL ./. ./. ./. ./. 0/0:1,0:1:3:0,3,28
./. 1/1:0,1:1:3:28,3,0 ./. ./. ./. ./. ./. ./. ./. 1/1:0,1:1:3:28,3,0 ./. ./. ./. ./. ./. ./. ./. 0/0:1,0:1:3:0,3,34 ./. ./. ./.
chr1 824632 rs75185704 T C 12.56 LowQual AC=3;AF=0.500;AN=6;BaseQRankSum=-1.231;DB;DP=12;Dels=0.00;FS=3.522;HaplotypeScore=0.0000;MLEAC=3;MLEAF=0.500;MQ=17.41;MQ0=1;MQRankSum=-1.231;QD=3.14;ReadPosRankSum=-0.358;SB=-1.975e-02 GT:AD:DP:GQ:PL ./. ./. ./. ./. 1/1:0,2:2:3:28,3,0 0/0:1,0:1:3:0,3,38 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1:1,1:2:22:22,0,26 ./. ./. ./.
chr1 825069 rs4475692 G C 20.08 LowQual AC=4;AF=1.00;AN=4;BaseQRankSum=1.026;DB;DP=18;Dels=0.00;FS=3.010;HaplotypeScore=1.3411;MLEAC=4;MLEAF=1.00;MQ=9.42;MQ0=6;MQRankSum=1.026;QD=3.35;ReadPosRankSum=0.000;SB=-1.975e-02 GT:AD:DP:GQ:PL ./. ./. ./. ./. ./. 1/1:1,2:3:3:23,3,0 ./. 1/1:1,1:3:3:30,3,0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
chr1 825207 rs61768257 T C 48.26 LowQual AC=6;AF=1.00;AN=6;DB;DP=20;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=6;MLEAF=1.00;MQ=11.27;MQ0=7;QD=9.65;SB=-1.975e-02 GT:AD:DP:GQ:PL ./. ./. ./. ./. ./. 1/1:0,2:2:3:28,3,0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 1/1:0,1:1:3:27,3,0 ./. ./. ./. ./. ./. 1/1:0,2:2:3:28,3,0 ./. ./. ./.
Hi thanks for the reply but I didnt understand fully what you meant. I actually want to make a multi alignment from the vcf file. So how can I get a value from the vcf file where we have 2 alleles for the same position. which allele should we consider and on what basis.
I had assumed that your sample was diploid, so you do have two alleles for the same position--correct. At a given locus, a 1/1 means the sample has two alternate alleles; 0/1 means one alternate and one reference; 0/0 means both are reference. If your sample is NOT diploid, then you may want to treat 0/0 as reference and 0/1 and 1/1 as variant.
thank you. My sample is a diploid sample. So now I am confused when I have 2 alleles which one I should consider for the multi alignment at that position. If it is a 1/1 i think I can consider the alternate and 0/0 i can consider reference but if it is 0/1 then with the diploid sample how can I make the multi alignment? Sorry if I am not able to convey it as the way I wanted to.