I am aligning genomic reads to a set of 6 genes all saved in one file in fasta format. The sam and bam files look ok but the mpileup step returns empty bcf files, with only the header until:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1-thal.nodup.sorted.bam
This is the whole code used:
for f in /reads/*1P.fastq.gz
do
name=${f%%*/}
sname=${name%%.*}
bname=$(basename $sname)
bwa mem thal $sname.1P.fastq.gz $sname.2P.fastq.gz > $bname-thal.sam
samtools view -bt thal.fasta.fai $bname-thal.sam > $bname-thal.bam
samtools sort $bname-thal.bam > $bname-thal.sorted.bam
samtools rmdup $bname-thal.sorted.bam $bname-thal.nodup.bam
samtools sort $bname-thal.nodup.bam > $bname-thal.nodup.sorted.bam
samtools mpileup -uf thal.fasta $bname-thal.nodup.sorted.bam > $bname-thal.bcf
bcftools call -mv -Ov $bname-thal.bcf > $bname-thal.vcf
done
I used this code with a different set of genes and it worked, what can be going wrong?
this is what I see with samtools tview
two comments: * use defensive bash programming to test if something failed : http://www.davidpashley.com/articles/writing-robust-shell-scripts/ * use linux pipeline to avoid those numerous intermediate files.
Thanks @Pierre, I normally run the pipeline for some commands, some of those intermediate files I need for other tasks. But in the script above I am testing this way to see at what step it fails. So far up to the *.nodup.sorted.bam it works well, that's how I know that it is the bcf file that is not being processed. So the alignment is working, not the variant calling