Hi everybody,
I am having a problem understanding the output of tophat concerning the multi-mapped reads. I would like to create a barplot of the unique vs. multi-mapped vs. unmapped reads from tophat-mapped bam files. (BTW is there already a tool for doing that?)
Now I wanted to extract the multi-mapped reads from one bam file. lat us say MW6. this is the align_summary.txt file I have:
Reads:
Input : 26140314
Mapped : 25159791 (96.2% of input)
of these: 1027691 ( 4.1%) have multiple alignments (1832 have >20)
96.2% overall read mapping rate.
But when I try to count the reads manually I get this :
$samtools view bamFiles/MW6.sorted.bam | grep -w -v "NH:i:1" | wc -l
4118642
and when I use the flagstat option from samtools:
$samtools flagstat bamFiles/MW6.SE.sorted.bam
28250742 + 0 in total (QC-passed reads + QC-failed reads)
3090951 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
28250742 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
... (the rest is 0)
The total number of input reads is 26140314 (as it says in the align_summary
file). The total mapped reads from the bam file is 28250742, as says in the samtools flagstat
. I guess this conflict is due to reads which mapped multiple times, but how can I get the exact number of multiple reads?
is it just 28250742 - 26140314 = 2110428 ?
Would it be similar in paired-end and single-end mapping runs?
thanks
Assa