Hello!
This is my first post here.
I am attempting to run some analyses on some Human whole exome germ line sequence data, there are 120 samples split into two groups (two different conditions) and I would like to run them to detect variants. This is my first time working with exome data so sorry if this is a noob question.
I was given the files in VCF.gz format, and I uploaded them into galaxy with the intention of running an exome seq pipeline. However, I am unable to do so, as the files have been uploaded into galaxy in tabular format and the pipeline requires fastq. I tried to convert the file format but couldn't do so.
I previewed the VCF file to see what was inside and it looks like this: (first two lines pasted)
1 2 3 4 5 6 7 8 9 10
chr1 861368 . CG C 1020.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.298;DP=146;FS=165.905;MLEAC=1;MLEAF=0.500;MQ=86.82;MQ0=0;MQRankSum=-1.499;QD=6.99;RPA=3,2;RU=G;ReadPosRankSum=2.010;SOR=5.577;STR GT:AD:DP:GQ:PL 0/1:54,67:145:99:1058,0,653
chr1 874544 . AG A 971.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.050;DP=60;FS=46.340;MLEAC=1;MLEAF=0.500;MQ=89.05;MQ0=0;MQRankSum=-2.496;QD=16.20;RPA=4,3;RU=G;ReadPosRankSum=-0.361;SOR=2.332;STR GT:AD:DP:GQ:PL 0/1:18,40:59:99:1009,0,175
...
Are exome VCF files normally supposed to look like this?
If I need to run this file, can I skip processing steps and just skip to something like GatK since the file is already in VCF format?
Thanks! Sorry for all the questions!