Bowtie / Tophat failing on unmapped reads
0
0
Entering edit mode
4.9 years ago
spitzer ▴ 10

I'm very new at working with TopHat and Bowtie. I'm trying to align a set of paired-end RNA-seq reads onto a reference genome. According to tophat.log, left_kept_reads and right_kept_reads are being successfully mapped to the transcriptome. However, when the TopHat pipeline resumes with the unmapped reads, and Bowtie2 tries to map left_kept_reads.m2g_um to the reference genome, it logs an message: "[bam_header_read] EOF marker is absent. The input is probably truncated." A minute or so later, the process throws an error and terminates.

When I examine the files left_kept_reads.m2g_um.bam and right_kept_reads.mg2_um.bam, I find that both of them are missing the 28-byte block at the end that samtools recognizes as EOF for a .bam file. I assume that's what is causing the program to crash, but I don't know why the EOF block isn't being added, or what I can do about it.


The tophat commands I'm running are:

module load samtools/1.8

module load boost/1.66.0_gcc5+

module load bowtie2/2.3.2

module load tophat/2.0.13

tophat -r 200 -G /project/bf528/project_2/reference/annot/mm9.gtf --segment-length=20 --segment-mismatches=1 --no-novel-juncs -o P0_2_tophat -p 16 /project/bf528/project_2/reference/mm9 P0_1_1.fastq P0_1_2.fastq

The tophat.log file shows:

  • [2019-06-05 13:12:56] Beginning TopHat run (v2.0.13)
  • [2019-06-05 13:12:56] Checking for Bowtie Bowtie version: 2.3.2.0
  • [2019-06-05 13:12:56] Checking for Bowtie index files (genome)..
  • [2019-06-05 13:12:56] Checking for reference FASTA file
  • [2019-06-05 13:12:56] Generating SAM header for /project/bf528/project_2/reference/mm9
  • [2019-06-05 13:12:58] Reading known junctions from GTF file
  • [2019-06-05 13:13:05] Preparing reads left reads: min. length=40, max. length=40, 21561496 kept reads (16066 discarded) right reads: min. length=40, max. length=40, 21347948 kept reads (229614 discarded)
  • [2019-06-05 13:18:56] Building transcriptome data files P0_2_tophat/tmp/mm9
  • [2019-06-05 13:19:13] Building Bowtie index from mm9.fa
  • [2019-06-05 13:31:32] Mapping left_kept_reads to transcriptome mm9 with Bowtie2
  • [2019-06-05 13:40:33] Mapping right_kept_reads to transcriptome mm9 with Bowtie2
  • [2019-06-05 13:49:22] Resuming TopHat pipeline with unmapped reads
  • [2019-06-05 13:49:22] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2
  • [bam_header_read] EOF marker is absent. The input is probably truncated.
  • [2019-06-05 13:49:36] Retrieving sequences for splices
  • [2019-06-05 13:50:42] Indexing splices [FAILED]
  • Error: Splice sequence indexing failed with err =1

Thanks in advance!

RNA-Seq tophat bowtie alignment • 1.0k views
ADD COMMENT
1
Entering edit mode

Unless there is a dire need for tophat, use a current aligner such as STAR where possible.

ADD REPLY
0
Entering edit mode

Before anything, I must state I agree with genomax and think you should consider a more recent RNAseq aligner.

Maybe the problem is with the SAMtools you are loading. From the release notes:

TopHat 2.0.13 release 10/2/2014

Version 2.0.13 is a maintenance release with the following changes:

  • removed SAMtools as an external dependency in order to avoid incompatibility issues with recent and future changes of SAMtools and its code library (an older, stable SAMtools version is now packaged with TopHat)

I would believe TopHat2 would preferentially use the bundled SAMtools, but you may try without module load samtools/1.8 and see if this helps.

Another thing to consider is an incompatibility between the particular TopHat version (from 2014) and Bowtie2 version (from 2017). You could try updating to the latest versions of both tools.

ADD REPLY

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6