Hello I have targeted enriched data from different plants that are, for some closely related and other a bit distant.
I have this idea to map then phase the alleles and call for the consensus for making matrices and at the end gene trees.
When I have a gap during the mapping, the phased alleles are calling the reference instead of having the gap; as shown in the . figure
Is there a way to overcome the this problem?
### Phasing alleles ###
samtools phase -AF -b $i.allele $i
### Phasing alleles ###
do echo -e "\n\nPhasing alleles"
echo -e "\n\n$i"
samtools phase -AF -b $i.allele $i;
### Sort phased bam files ###
echo -e "\n\nSort phased bam files..."
samtools sort -@ 32 $i.allele.0.bam -o $i.allele.0.sorted.bam
samtools sort -@ 32 $i.allele.1.bam -o $i.allele.1.sorted.bam
### mpileup and create fq files ###
echo -e "\n\nmpileup allele 0..."
samtools mpileup -u -f markers.fasta $i.allele.0.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.0.fq
echo -e "\n\nmpileup allele 1..."
samtools mpileup -u -f markers.fasta $i.allele.1.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.1.fq
### transform fq to fasta ###
echo -e "\n\nFastq to Fasta"
~/nobackup/seqtk/seqtk seq -a $i.allele.0.fq > $i.allele.0.fasta
~/nobackup/seqtk/seqtk seq -a $i.allele.1.fq > $i.allele.1.fasta
I think this is because you are aligning against a reference in which you included the bases that are deleted in at least some subject. When you align reads with the deletion you do not align anything, so there is no SNP and the assigned base is the reference. You should filter the results, for example if no (or very few) reads align in some parts of your target, instead of the reference allele you put a gap.