Consensus sequences with reference allele instead of missing
0
0
Entering edit mode
6.4 years ago
agathejouet ▴ 10

Hi all,

I am trying to use bcftools consensus (version 1.3.1) to get consensus sequences using a reference fasta file and a multisample vcf. The problem is that when bases are missing (I can actually see that on my bam files), bcftools consensus prints the reference allele. I would like to have these as missing because I want to be able to see structural variations (some genes will actually be absent from my samples...). An example of line I have in my vcf is:

LOC_Os08g14850_chr8_8938483..8952092_UTR-0      1       .       G       .       999     .       .       GT:DP   ./.:0   ./.:1   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0   ./.:1   ./.:0   ./.:0   ./.:1   ./.:1   ./.:0   ./.:0   ./.:0   ./.:0   ./.:3   ./.:0   ./.:0   ./.:0   ./.:0   ./.:0
   ./.:1   ./.:0   ./.:2   ./.:0   ./.:1   ./.:0

The first position in the gene appears to not have been sequenced in any of my samples. However, the reference allele (a G in that case) is printed in my consensus fasta files...

Any help appreciated.

Many thanks,

Agathe

bcftools consensus missing bases • 1.5k views
ADD COMMENT
0
Entering edit mode

You might need to make an all-points vcf, one that has an entry for every single letter. The software is likely assuming that 'no news is good news' when it comes to loci with no vcf entry.

ADD REPLY

Login before adding your answer.

Traffic: 2839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6