samtools stats result different from samtools flagstat
2
1
Entering edit mode
5.0 years ago
snishtala03 ▴ 70

I am using samtools v1.6 to get some stats on my bam file. Here, I merged my two paired end reads and ran bwa twice, once on merged reads and once on unmerged reads and then merged the two .bam files using samtools merge. I then used both samtools stats and samtools flagstat to get some stats but interestingly, I get different results. Can you help me understand why?

samtools flagstat Sample_sorted.bam 

60970 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
8465 + 0 supplementary
0 + 0 duplicates
45003 + 0 mapped (73.81% : N/A)
27856 + 0 paired in sequencing
13046 + 0 read1
14810 + 0 read2
0 + 0 properly paired (0.00% : N/A)
22138 + 0 with itself and mate mapped
1051 + 0 singletons (3.77% : N/A)
19562 + 0 with mate mapped to a different chr
4 + 0 with mate mapped to a different chr (mapQ>=5)


samtools stats Sample_sorted.bam |grep ^SN | cut -f 2-
raw total sequences:    52505
filtered sequences: 0
sequences:  52505
is sorted:  1
1st fragments:  37695
last fragments: 14810
reads mapped:   36538
reads mapped and paired:    22138   # paired-end technology bit set + both mates mapped
reads unmapped: 15967
reads properly paired:  0   # proper-pair bit set
reads paired:   27856   # paired-end technology bit set
reads duplicated:   0   # PCR or optical duplicate bit set
reads MQ0:  36523   # mapped and MQ=0
reads QC failed:    0
non-primary alignments: 0
total length:   7949149 # ignores clipping
bases mapped:   6010513 # ignores clipping
bases mapped (cigar):   5909812 # more accurate
bases trimmed:  0
bases duplicated:   0
mismatches: 484745  # from NM fields
error rate: 8.202376e-02    # mismatches / bases mapped (cigar)
average length: 151
maximum length: 293
average quality:    35.5
insert size average:    267.3
insert size standard deviation: 229.1
inward oriented pairs:  1005
outward oriented pairs: 134
pairs with other orientation:   3
pairs on different chromosomes: 9781

I also did a simple count on my bam file and it aligns with samtools flagstat results but not stats results.

samtools view Sample_sorted.bam | wc -l
60970
samtools • 14k views
ADD COMMENT
4
Entering edit mode
5.0 years ago
snishtala03 ▴ 70

I think I figured it out -

It looks like 60970 reads in flagstat includes the supplementary reads while stats does not count it. 52505 + 8465 = 60970

ADD COMMENT
2
Entering edit mode
3.7 years ago
adi.rotem ▴ 20

I have an example where raw total sequences: from the samstats is the same as QC-passed reads(flagstat) - secondary (flagstat), so note that it could be that raw total sequences(samstat) = QC-passed reads(flagstat) - secondary (flagstat) - supplementary (flagstat)

but I'm not sure. This is my example:

flagstat:

16233358 + 0 in total (QC-passed reads + QC-failed reads) 2194850 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 15871381 + 0 mapped (97.77% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

samts:

This file was produced by samtools stats (1.9+htslib-1.9) and can be plotted using plot-bamstats

This file contains statistics for all reads.

The command line was: stats rep1_local_aligned.sam

CHK, Checksum [2]Read Names [3]Sequences [4]Qualities

CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)

CHK 2c2dd00f 8a348144 a42ae467

Summary Numbers. Use grep ^SN | cut -f 2- to extract this part.

SN raw total sequences: 14038508 SN filtered sequences: 0 SN sequences: 14038508 SN is sorted: 0 SN 1st fragments: 14038508 SN last fragments: 0 SN reads mapped: 13676531 SN reads mapped and paired: 0 # paired-end technology bit set + both mates mapped SN reads unmapped: 361977 SN reads properly paired: 0 # proper-pair bit set SN reads paired: 0 # paired-end technology bit set SN reads duplicated: 0 # PCR or optical duplicate bit set SN reads MQ0: 394 # mapped and MQ=0 SN reads QC failed: 0 SN non-primary alignments: 2194850 SN total length: 1027717498 # ignores clipping SN total first fragment length: 1027717498 # ignores clipping SN total last fragment length: 0 # ignores clipping SN bases mapped: 1001586559 # ignores clipping SN bases mapped (cigar): 998852161 # more accurate SN bases trimmed: 0 SN bases duplicated: 0 SN mismatches: 2103263 # from NM fields SN error rate: 2.105680e-03 # mismatches / bases mapped (cigar) SN average length: 73 SN average first fragment length: 73 SN average last fragment length: 0 SN maximum length: 75 SN maximum first fragment length: 0 SN maximum last fragment length: 0 SN average quality: 34.2 SN insert size average: 0.0 SN insert size standard deviation: 0.0 SN inward oriented pairs: 0 SN outward oriented pairs: 0 SN pairs with other orientation: 0 SN pairs on different chromosomes: 0 SN percentage of properly paired reads (%): 0.0

ADD COMMENT

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6