Difference between pair alignment rate and properly paired reads
1
0
Entering edit mode
8.5 years ago

I have some RNA-Seq data (paired-end reads) which I have aligned using TopHat. This is how the align summary look like:

Left reads:
          Input     :  25671258
           Mapped   :  22149823 (86.3% of input)
            of these:   2005259 ( 9.1%) have multiple alignments (200624 have >20)
Right reads:
          Input     :  25671258
           Mapped   :  21866868 (85.2% of input)
            of these:   1977056 ( 9.0%) have multiple alignments (199383 have >20)
85.7% overall read mapping rate.

Aligned pairs:  21161013
     of these:   1903746 ( 9.0%) have multiple alignments
                  801300 ( 3.8%) are discordant alignments
79.3% concordant pair alignment rate.

When I run

samtools flagstat accepted_hits.bam

This is the result:

66612729 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
66612729 + 0 mapped (100.00%:nan%)
66612729 + 0 paired in sequencing
33517595 + 0 read1
33095134 + 0 read2
34458604 + 0 properly paired (51.73%:nan%)
64054464 + 0 with itself and mate mapped
2558265 + 0 singletons (3.84%:nan%)
13192728 + 0 with mate mapped to a different chr
402500 + 0 with mate mapped to a different chr (mapQ>=5)

I don't understand why the percentage of pair alignment given by Tophat does not correspond to the percentage of properly paired reads. Besides this, I do find that in the bam file that are paired reads mapped in different chromosomes. Could you please help understand this?

RNA-Seq paired-end bam samtools • 4.1k views
ADD COMMENT
3
Entering edit mode
8.5 years ago
Kamil ★ 2.3k

I believe that the two tools report different numbers:

  • Tophat says you have 21,161,013 aligned pairs of reads.
  • Samtools says that your BAM file has 34,458,604 alignments with properly paired reads.

Tophat is counting the number of read pairs and samtools is counting the number of alignments. The reason the two numbers do not agree is because a single pair of reads can have more than one alignment in the BAM file. So, a single pair of reads is counted once by Tophat, but counted multiple times by samtools.

You might try checking the number of properly paired read pairs in your BAM file by counting unique read identifiers:

samtools view -f 2 file.bam | cut -f1 | sort -u | wc -l
ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6