Different version of Tophat gives me different number of mapped reads
1
0
Entering edit mode
5.7 years ago
BioDH ▴ 10

I have paired-end sequenced RNA-seq files(Illumina, fastq).

I trimmed the reads by trimmomatic

java -jar <path>/trimmomatic-0.36.jar PE -threads 4 -phred33 1.fastq 2.fastq 1_trim1.fastq 1_unpaire    d1.fastq 2_trim2.fastq 2_unpaired2.fastq ILLUMINACLIP:<path>/TruSeq3-PE-2.fa:3:30:10 SLIDINGWINDOW:5:20 MINLEN:20

around 14M paired reads were survived.

I aligned the trimmed fastq files to genome by tophat(old version = 2.1.0, new version = 2.1.1) on exactly same argments.

# This is old version (2.1.0)  
 tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_old_alignments <genome> 1_trim1.fastq 2_trim2.fastq

 # This is new version (2.1.1)

~/script/tophat-2.1.1/tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_new_alignments <genome> 1_trim1.fastq 2_trim2.fastq

I check the number of reads

samtools veiw -c accepted_hits.bam

old version gave me 6,910,198

new version 27,645,322

I don't know why the number of reads are so different.

Just in case, I show you align_summary.txt

#old version
Left reads:
          Input     :   3540517
           Mapped   :   3449572 (97.4% of input)
Right reads:
          Input     :   3540517
           Mapped   :   3460626 (97.7% of input)
97.6% overall read mapping rate.

Aligned pairs:   3380776
                 1025738 (30.3%) are discordant alignments
66.5% concordant pair alignment rate.

#new version
Left reads:
          Input     :  14189364
           Mapped   :  13781744 (97.1% of input)
Right reads:
          Input     :  14189364
           Mapped   :  13863578 (97.7% of input)
97.4% overall read mapping rate.

Aligned pairs:  13503616
                 4100383 (30.4%) are discordant alignments
66.3% concordant pair alignment rate.

Could anyone please explain it?

RNA-Seq alignment tophat • 1.6k views
ADD COMMENT
1
Entering edit mode

All versions of TopHat are the old version. It should not be used any more - the authors themselves state this.

ADD REPLY
1
Entering edit mode
5.7 years ago

Obviously the "old" run omitted a lot of of your reads. Figure out why. I'd start by just rerunning it again, in case it got randomly stopped mid run.

ADD COMMENT
2
Entering edit mode

swbarnes2's answer is the correct answer to your question: somehow the run with Tophat 2.1.0 didn't read the entire fastq files.

However, jrj.healey comment above is the correct answer to your needs. From the Tophat2 page:

Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

ADD REPLY

Login before adding your answer.

Traffic: 3434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6