I'm very new at working with TopHat and Bowtie. I'm trying to align a set of paired-end RNA-seq reads onto a reference genome. According to tophat.log, left_kept_reads and right_kept_reads are being successfully mapped to the transcriptome. However, when the TopHat pipeline resumes with the unmapped reads, and Bowtie2 tries to map left_kept_reads.m2g_um to the reference genome, it logs an message: "[bam_header_read] EOF marker is absent. The input is probably truncated." A minute or so later, the process throws an error and terminates.
When I examine the files left_kept_reads.m2g_um.bam and right_kept_reads.mg2_um.bam, I find that both of them are missing the 28-byte block at the end that samtools recognizes as EOF for a .bam file. I assume that's what is causing the program to crash, but I don't know why the EOF block isn't being added, or what I can do about it.
The tophat commands I'm running are:
module load samtools/1.8
module load boost/1.66.0_gcc5+
module load bowtie2/2.3.2
module load tophat/2.0.13
tophat -r 200 -G /project/bf528/project_2/reference/annot/mm9.gtf --segment-length=20 --segment-mismatches=1 --no-novel-juncs -o P0_2_tophat -p 16 /project/bf528/project_2/reference/mm9 P0_1_1.fastq P0_1_2.fastq
The tophat.log file shows:
- [2019-06-05 13:12:56] Beginning TopHat run (v2.0.13)
- [2019-06-05 13:12:56] Checking for Bowtie Bowtie version: 220.127.116.11
- [2019-06-05 13:12:56] Checking for Bowtie index files (genome)..
- [2019-06-05 13:12:56] Checking for reference FASTA file
- [2019-06-05 13:12:56] Generating SAM header for /project/bf528/project_2/reference/mm9
- [2019-06-05 13:12:58] Reading known junctions from GTF file
- [2019-06-05 13:13:05] Preparing reads left reads: min. length=40, max. length=40, 21561496 kept reads (16066 discarded) right reads: min. length=40, max. length=40, 21347948 kept reads (229614 discarded)
- [2019-06-05 13:18:56] Building transcriptome data files P0_2_tophat/tmp/mm9
- [2019-06-05 13:19:13] Building Bowtie index from mm9.fa
- [2019-06-05 13:31:32] Mapping left_kept_reads to transcriptome mm9 with Bowtie2
- [2019-06-05 13:40:33] Mapping right_kept_reads to transcriptome mm9 with Bowtie2
- [2019-06-05 13:49:22] Resuming TopHat pipeline with unmapped reads
- [2019-06-05 13:49:22] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2
- [bam_header_read] EOF marker is absent. The input is probably truncated.
- [2019-06-05 13:49:36] Retrieving sequences for splices
- [2019-06-05 13:50:42] Indexing splices [FAILED]
- Error: Splice sequence indexing failed with err =1
Thanks in advance!