Phasing alleles, indel gap filled with reference.
1
0
Entering edit mode
6.8 years ago

Hello I have targeted enriched data from different plants that are, for some closely related and other a bit distant.

I have this idea to map then phase the alleles and call for the consensus for making matrices and at the end gene trees.

When I have a gap during the mapping, the phased alleles are calling the reference instead of having the gap; as shown in the figure. figure

Is there a way to overcome the this problem?

### Phasing alleles ###

samtools phase -AF -b $i.allele $i

### Phasing alleles ###
do echo -e "\n\nPhasing alleles"
echo -e "\n\n$i"

samtools phase -AF -b $i.allele $i;

### Sort phased bam files ###
echo -e "\n\nSort phased bam files..."
samtools sort -@ 32 $i.allele.0.bam -o $i.allele.0.sorted.bam
samtools sort -@ 32 $i.allele.1.bam -o $i.allele.1.sorted.bam

### mpileup and create fq files ###
echo -e "\n\nmpileup allele 0..."
samtools mpileup -u -f markers.fasta $i.allele.0.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.0.fq
echo -e "\n\nmpileup allele 1..."
samtools mpileup -u -f markers.fasta $i.allele.1.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.1.fq

### transform fq to fasta ###
echo -e "\n\nFastq to Fasta"
~/nobackup/seqtk/seqtk seq -a $i.allele.0.fq > $i.allele.0.fasta
~/nobackup/seqtk/seqtk seq -a $i.allele.1.fq > $i.allele.1.fasta
phasing allele SNP alignment samtools • 2.1k views
ADD COMMENT
0
Entering edit mode

I think this is because you are aligning against a reference in which you included the bases that are deleted in at least some subject. When you align reads with the deletion you do not align anything, so there is no SNP and the assigned base is the reference. You should filter the results, for example if no (or very few) reads align in some parts of your target, instead of the reference allele you put a gap.

ADD REPLY
0
Entering edit mode
6.8 years ago

Thank you for your reply I think I found a way to mask the gap from the final fasta file. The solution was posted in this discussion , however I grep the tag DP=0 to have the position without coverage (gap).

here the updated code

for i in $(cat list);

### Phasing alleles ###
do echo -e "\n\nPhasing alleles"


samtools phase -AF -b $i.allele $i"_L006mapped.bam";

### Sort phased bam files ###
echo -e "\n\nSort phased bam files..."
samtools sort -@ 32 $i.allele.0.bam -o $i.allele.0.sorted.bam
samtools sort -@ 32 $i.allele.1.bam -o $i.allele.1.sorted.bam


### Generating the vcf file ###
echo -e "\n\nGenerating the vcf file..."
samtools mpileup -uv -f markers.fasta $i.allele.0.sorted.bam > $i.allele.0.vcf
samtools mpileup -uv -f markers.fasta $i.allele.1.sorted.bam > $i.allele.1.vcf

### Generating the bed file ###
echo -e "\n\nGenerating the bed file..."
grep DP=0 $i.allele.0.vcf | awk '{OFS="\t"; if ($0 !~ /\#/); print $1, $2-1, $2}' > $i.allele.0.bed
grep DP=0 $i.allele.1.vcf | awk '{OFS="\t"; if ($0 !~ /\#/); print $1, $2-1, $2}' > $i.allele.1.bed


### Generating the fasta file ###
echo -e "\n\nGenerating the fasta file..."
bcftools call -c $i.allele.0.vcf | vcfutils.pl vcf2fq |~/nobackup/seqtk/seqtk seq -a - > $i.allele.0.fasta
bcftools call -c $i.allele.1.vcf | vcfutils.pl vcf2fq |~/nobackup/seqtk/seqtk seq -a - > $i.allele.1.fasta


### Mask indel from the fasta file ###
echo -e "\n\nMask indel from the fasta file..."
bedtools maskfasta -fi $i.allele.0.fasta -fo $i.indel.0.fasta -bed $i.allele.0.bed
bedtools maskfasta -fi $i.allele.1.fasta -fo $i.indel.1.fasta -bed $i.allele.1.bed
done ;
ADD COMMENT

Login before adding your answer.

Traffic: 2654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6