I am reading the mathematical on Samtools Algorithms to understand how mpileup works. I ran mpileup per each sample and multiples samples.
/share/bin/samtools-0.1.16/samtools mpileup -Euf 0.1.16/bcftools/vcfutils.pl varFilter -D100 > 1_Euf.flt.vcf
and I also run 1.sorted.bam 2.sorted.bam together.
I compared 1Euf.flt.vcf, 2Euf.flt.vcf, 1and2_Euf.flt.vcf with venn diagram. Looked for overlap and unique regions.
10176 SNPs 1_Euf.flt.vcf
10649 SNPs 2_Euf.flt.vcf
18000 SNPs 1and2_Euf.flt.vcf
3102 SNPS overlap of three files without header lines
1728 unique_1_Euf.flt.vcf
181 unique_2_Euf.flt.vcf
2227 unique_3_Euf.flt.vcf
unique1Euf.flt.vcf This is only unique to 1st file and didnt show up in vcf files of together files. I dont understand why. When I call variants from multiple sorted bams with mpileup, I dont see these 1728 positions. I really want to understand main reasons and algorthms. Could you explain basicly the how mpileup works on multiple files and single files.
When I look into together.vcf file, There are 2227 SNP positions didnt show up in each separete vcf files. this can be due to the sum of read count of each files exceed the threshold and show up in together file, right? Is there any other explanations?
chr16 27250658 . A T 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 27367972 . G A 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 27561584 . G A 24 . DP=4;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=20;FQ=-36;SF=0 GT:PL:GQ 1/1:56,9,0:15
chr16 27729524 . C T 26 . DP=10;AF1=0.5002;CI95=0.5,0.5;DP4=1,2,5,1;MQ=20;FQ=6.19;PV4=0.23,0.3,1,1;SF=0 GT:PL:GQ 0/1:56,0,32:35
chr16 28890941 . T G 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 28937275 . G A 3.64 . DP=3;AF1=0.5205;CI95=0.5,0.5;DP4=0,1,1,1;MQ=20;FQ=-16.1;PV4=1,1,1,1;SF=0 GT:PL:GQ 0/1:31,0,11:18
chr16 29777088 . C A 9.31 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,1,1;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:40,6,0:8
chr16 29808708 . C G 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 29822841 . A C 9.31 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,1,1;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:40,6,0:8
chr16 30102802 . T C 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 30457350 . C T 37.5 . DP=4;AF1=1;CI95=0.5,1;DP4=0,0,1,3;MQ=20;FQ=-39;SF=0 GT:PL:GQ 1/1:70,12,0:21
chr16 30759716 . T C 17.1 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=20;FQ=-36;SF=0 GT:PL:GQ 1/1:49,9,0:15
chr16 30926303 . G C 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 30960507 . T C 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 31044683 . A G 26 . DP=6;AF1=0.5032;CI95=0.5,0.5;DP4=0,2,2,2;MQ=20;FQ=-8.63;PV4=0.47,0.16,1,0.2;SF=0 GT:PL:GQ 0/1:56,0,19
:22
chr16 31085470 . T C 6.02 . DP=4;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 33503129 . C T 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 33866630 . G T 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 33894371 . A G 15.1 . DP=4;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=20;FQ=-36;SF=0 GT:PL:GQ 1/1:47,9,0:14
chr16 33894379 . A T 8.93 . DP=4;AF1=0.543;CI95=0.5,1;DP4=1,0,3,0;MQ=20;FQ=-19;PV4=1,0.0068,1,1;SF=0 GT:PL:GQ 0/1:38,0,8:12
chr16 33897385 . T C 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 33901775 . A G,C 3.41 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,0,3;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:43,17,11,31,0,28:4
chr16 33922739 . A G 17.1 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=20;FQ=-36;SF=0 GT:PL:GQ 1/1:49,9,0:15
chr16 33922743 . A C 17.1 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=20;FQ=-36;SF=0 GT:PL:GQ 1/1:49,9,0:15
chr16 33946169 . C T 6.02 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=20;FQ=-33;SF=0 GT:PL:GQ 1/1:36,6,0:6
chr16 33950474 . A G 80.1 . DP=14;AF1=1;CI95=0.5,1;DP4=2,0,7,4;MQ=20;FQ=-28;PV4=1,0.3,1,0.17;SF=0 GT:PL:GQ 1/1: