Entering edit mode
2.8 years ago
fcarolinebe
▴
40
Hi,
I made a snp call from 12 rna-seqs using GATK and Freebayes. But I need to put the files together in one single file .vcf, keeping the coluns with CHROM, POS, ID, REF, ALT and all informations about filters of all samples.
Somebody can help me ? Please!!!
I used this script:
bcftools +missing2ref ffinal_tow_teste_vcf.gz | bcftools query -H -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' > final_all_2.vcf
but the result was printed with alt as ref and not samples with their own snp:
# [1]CHROM [2]POS [3]ID [4]REF [5]ALT [6]20:GT [7]2:20:GT
LG1 43262 . G T G G
LG1 45686 . C T C C
LG1 46582 . G C G G
LG1 47066 . T C T T
LG1 48178 . A G A A
LG1 48716 . G A G G
LG1 48916 . C T C C
LG1 49053 . C T C C
LG1 49499 . G C G G
LG1 49563 . T C T T
LG1 49596 . G C G G
LG1 49702 . G A G G
LG1 50029 . T C T T
Are you familiar with bcftools? Why are you showing us the output of bcftools query when you want VCF output?
I get it! But, my question is how can I join the files, keeping the column CHROM, POS, ID, REF, ALT, where SNPs for each sample appear corresponding to their sequence? Because when I put the alternative SNPs together they look like the reference SNP, as shown above.
I'm sorry, I'm unable to understand your problem statement. Can you explain a little more and maybe show a couple of examples of what you need?
OK,
initially I tried to join the VCFs files, but I got the files together as a result, only it being impossible to identify which sequence has the alternative SNP ... as in the example:
Ps: I know that you are very confused to understand the result placed in this way, but I am unable to place a screen print, to facilitate its understanding
Then I tried to use the QUERY tool to try to separate the SNPs by sample, and even though it appears that there is an alternative SNP different from the reference SNP, when I run the scipt all samples appear as if the SNP was identified to the reference.
We are having some major communication difficulties here. Screenshots of plain text are counterproductive, you're doing the right thing by pasting the content directly. See this post for plain text formatting tips: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists
You should provide examples of your individual VCF entries, what you need as the final result (expected output), what you tried running and what you get (actual output). In my experience, when you prepare these details, you will often stumble upon the solution yourself. If that doesn't happen, it will at least make it extremely easy for others to help you.