Question

Tophat runs through without errors but no read mapped

0

Entering edit mode

7.8 years ago

jrxu.bioinf ▴ 20

Hello,

I am a new user of tophat. The version in use is v2.1.1 (tophat2).

I used default parameters except for --no-coverage-search to save time. The read length is 102. The running output looks fine (as shown below), but 0% reads are mapped to the genome.

BTW, this exact same setting has been successfully mapped another set of RNA-seq data (read length = 30).

Thanks!

[2016-06-26 14:44:07] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-06-26 14:44:07] Checking for Bowtie
                  Bowtie version:        2.2.9.0
[2016-06-26 14:44:07] Checking for Bowtie index files (genome)..
[2016-06-26 14:44:07] Checking for reference FASTA file
[2016-06-26 14:44:07] Generating SAM header for ~/data/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index/genome
[2016-06-26 14:44:58] Reading known junctions from GTF file
[2016-06-26 14:45:01] Preparing reads
         left reads: min. length=102, max. length=102, 36967894 kept reads (26707 discarded)
[2016-06-26 14:58:17] Building transcriptome data files ./tophat_out/tmp/genes
[2016-06-26 14:58:55] Building Bowtie index from genes.fa
[2016-06-26 15:05:33] Mapping left_kept_reads to transcriptome genes with Bowtie2
[2016-06-26 15:28:27] Resuming TopHat pipeline with unmapped reads
[2016-06-26 15:28:27] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2
[2016-06-26 16:14:47] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2016-06-26 16:34:49] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2016-06-26 16:50:05] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2016-06-26 17:00:49] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2016-06-26 17:18:38] Searching for junctions via segment mapping
[2016-06-26 17:28:01] Retrieving sequences for splices
[2016-06-26 17:29:11] Indexing splices
Building a SMALL index
[2016-06-26 17:30:18] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2016-06-26 17:37:09] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2016-06-26 17:44:09] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2016-06-26 17:50:15] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2016-06-26 17:58:30] Joining segment hits
[2016-06-26 18:05:44] Reporting output tracks
-----------------------------------------------
[2016-06-26 18:18:49] A summary of the alignment counts can be found in ./tophat_out/align_summary.txt
[2016-06-26 18:18:49] Run complete: 03:34:42 elapsed

ALIGN summary is below

Reads:
          Input     :  36994601
           Mapped   :      5067 ( 0.0% of input)
            of these:      1534 (30.3%) have multiple alignments (3 have >20)
 0.0% overall read mapping rate.

RNA-Seq next-gen alignment software error • 2.6k views

ADD COMMENT • link updated 7.8 years ago by GenoMax 141k • written 7.8 years ago by jrxu.bioinf ▴ 20

0

Entering edit mode

Any time you see no or less than expected alignment the first thing to try is to take a random sample of reads (10-15) and to do a blast at NCBI. If the top hits are not from the genome you expect to be there then you will have to start figuring out what went wrong. If the blast hits are partial then it is possible that you have adapter contamination in your data (did you look at the data with FastQC before alignments) and you would need to trim the reads before alignment.

ADD REPLY • link 7.8 years ago by GenoMax 141k

1

Entering edit mode

I should have used "split-file" to convert SRA to FASTQ. Without the parameter, the pair-end reads are merged into one and cause the problem!

ADD REPLY • link 7.8 years ago by jrxu.bioinf ▴ 20

0

Entering edit mode

Use Kraken to screen reads, it is faster than BLAST and allows you to screen the whole dataset.

ADD REPLY • link 7.8 years ago by pld 5.1k

0

Entering edit mode

Default kraken db only has bacterial, archaeal and viral data so that would not always provide a useful answer. Surely blasting 10-15 sequences (in this case where almost no reads are aligning) would be much faster than kraken.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

You can add sequences to or alter Kraken databases. Sure, blasting a few reads is quick, but won't allow you to get an idea of the degree of contamination.

ADD REPLY • link 7.8 years ago by pld 5.1k

0

Entering edit mode

I tested several reads. Each read is mapped perfectly to a transcript correctly, BUT the first half of the read is mapped to forward strand and the second half to reverse strand. How to handle this read format? Thanks.

ADD REPLY • link 7.8 years ago by jrxu.bioinf ▴ 20

0

Entering edit mode

Did you run FastQC on these?

ADD REPLY • link 7.8 years ago by pld 5.1k

0

Entering edit mode

That most likely indicates that you have short inserts (and thus read-through/contamination with Illumina adapters). You would need to trim these reads to get them to aligns.

ADD REPLY • link 7.8 years ago by GenoMax 141k