Sam Tools Pileup Format
2
2
Entering edit mode
12.1 years ago

I have a question about the output of the routine "samtool mpileup":

"In the final result there are not the reference bases without coverage"

I'd like to get the entire reference genome in the final pileup file.

Thanks in advance for the answer!

Best,

Vittorio Fortino

samtools mpileup format • 6.7k views
ADD COMMENT
2
Entering edit mode
12.1 years ago
Ian 6.0k

You need to run mpileup piped to bcftools. The latter differs from the variant calling example of samtools, by missing out the 'v' flag.

samtools mpileup -uf /path_to/genome.fa sorted.bam | bcftools view -cgb - > sorted.cons.raw.bcf'

You can then use bcftools view to vcfutils to output sequence as fastq format, in this case explicitly using positions where there is at least 10 read coverage, but less than 500 reads (misses out regions that have too high coverage due to artefacts). I usually set -D to be 3-5x the median depth of coverage for the genome.

bcftools view sorted.cons.raw.bcf | vcfutils.pl vcf2fq -d10 -D500 > sorted.mpileup.filtered_d10_D500.fq'

You can then use Seqtk to convert the fastq sequence to fasta, while filtering positions based on a phred quality score; 40 (roughly equiv. to P0.0001) appears to be an excepted minimum. Positions below your threshold are masked in the output.

seqtk fq2fa sorted.mpileup.filtered_d10_D500.fq > sorted.mpileup.filtered_d10_D500.fa

I hope that is clear. This is how i did it last (about six months ago). I think seqtk is packaged with samtools now...

ADD COMMENT
0
Entering edit mode

Good answer! Two small updates that might help others looking at this post:

This command :

 bcftools view -cg - > sorted.cons.raw.bcf'

should be

 bcftools view -cgb - > sorted.cons.raw.bcf'

Also, the current seqtk code syntax is

seqtk seq -a inFile.fq > outFile.fa
ADD REPLY
0
Entering edit mode

Thank you very much for highlighting my error. I have amended my post :)

ADD REPLY
1
Entering edit mode
12.1 years ago
Swbarnes2 ★ 1.6k

Instead of running a separate program to convert fastq to fasta, vcfutils.pl is easy enough to alter. It's a perl script, so it's a flat text file. Basically, you want the last few lines of vcf2fq to look like this:

  print "\>$chr\n"; &v2q_print_str($seq);
#  print "+\n"; &v2q_print_str($qual);

That'll drop the quality score, and the preceding "+", and put a > instead of a @.

ADD COMMENT
0
Entering edit mode

Nice! This works perfectly. No need to convert the .fq file.

ADD REPLY

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6