Hello,
I have paired end data which I am mapping (using bwa - mem) to 17 viral references which are ~85% similar to one another (i.e I have one reference file that I give to bwa mem which has 17 different fastqs). So, obviously, I have most of my reads mapping to multiple references. Which is needed for my analysis.
However, I also want to get a count of uniquely mapped reads, i.e, if a read maps to 6/17 references, I want to count it only once. I have tried to filer my bam file based on quality or using grep on XA and XS tags but this removes all the 6 mapping.
samtools view -h sorted.bam | grep -v -e 'XA:Z:' -e 'SA:Z:' | samtools view -b > unique.bam
samtools view -h sorted.bam | grep -v -e 'XA:Z:' -e 'SA:Z:' | samtools view -b > unique.bam
Is there a way I can keep one instance of those 6 mappings?
Not answering your question directly but providing a suggestion to get what you need.
If you align with
bbmap.sh
you get the option of treating multi-mapping reads by placing them in only one of those 6 locations by usingambig=random
option.You could also grab the alignments via
make then non-redundant (leave one copy you want) and then add them back to your
unique.bam
.Thanks for your response!
So while I was trying to this, I came across this example below. It shows that a read has secondary alignments and supplementary alignments, but not all of them are seen in the bam file. Is this how secondary and supplementary alignments usually appear or does this mean that secondary ans supplementary alignments are marked but they are not shown as outputs?
To get secondary and supplementary reads in different lines, I also tried using -a and -M flags when running bwa-mem, but then these secondary and supplementary reads appear as * . A little confused here, am I missing something? Showing a few examples below -