Variant calling - display nucleotides with 0 coverage
0
0
Entering edit mode
7.3 years ago
yarmda ▴ 40

I am calling variants compared to a given genome. I'm finding that using samtools mpileup or bcftools mpileup, I only get reports after there is some nonzero depth of coverage.

Further, I'm finding that if there is a mismatch within the first 5 or so nucleotides, that they will be skipped altogether.

I am hoping that if I can set a minimum depth of coverage to be 0, I will see these skipped regions. Further, I'm hoping that I will see the variant called in those skipped regions, too.

As far as I can tell, samtools/bcftools mpileup only allows you to set a maximum depth of coverage, not a minimum. Is there another tool I could use for this or a parameter I can set for either of these?

samtools bcftools mpileup vcf bcf • 1.9k views
ADD COMMENT
0
Entering edit mode

Can you clarify this a little? Variants cannot be called where there is 0 coverage, so I don't understand what you want to do. Also, I'm not sure what you mean by "if there is a mismatch within the first 5 or so nucleotides, that they will be skipped altogether". The first 5 nucleotides of what? And what will be skipped? Do you mean that variants only appearing in the first 5 bp of a read are ignored, or the first 5 bp of a contig?

ADD REPLY
0
Entering edit mode

Sorry, I'll clarify.

Variants can't be called when there is 0 coverage, but I do want to see those uncovered nucleotides, if possible. This is almost entirely because when one of the reads near the beginning of the genome has a mismatch in the first 5 or so nucleotides of the read - the variant calling won't begin until after the mismatch. Since, I'm interested in those mismatches, I'd like the variant caller to begin calling at the onset of the read rather than once it starts matching. Does that make more sense?

ADD REPLY
0
Entering edit mode

Hmmm... it's generally unsafe to call variants near the ends of reads because the alignment there is poor; you can't distinguish between substitutions and indels. But, you can use BBMap's variant-caller like this:

callvariants.sh in=mapped.sam ref=ref.fa out=variants.vcf border=0 clearfilters

"border=0" will include variants all the way to the end of the reads, and "clearfilters" will display all variants, even low-quality ones with depth 1. You can also set the various filters (including minreads, which is the minimum number of reads indicating a variant) manually if you want.

ADD REPLY
0
Entering edit mode

Thanks for showing me this, Brian.

I understand the risks of calling variants in these regions.

I downloaded your code and ran it as you listed. I'm only seeing variants here, not regions of the genome where my reads aligned perfectly. Is there another setting for this? I don't see such an option in the usage.

ADD REPLY
0
Entering edit mode

No... the variants file only contains differences. But you can produce a coverage file with pileup:

pileup.sh in=mapped.bam basecov=basecov.txt

That will produce one line per reference base, indicating its coverage. If there are no variants at a location in the VCF file, then all of the coverage would match the reference at that location. In other words, this coverage includes both matches and mismatches. Note that the positions from pileup are 0-based while VCF files are 1-based.

ADD REPLY

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6