Splitting fastq file for mapping with tophat2
1
0
Entering edit mode
7.9 years ago
debitboro ▴ 260

hi all,

In order to map RNASeq fastq files of about 40M to ensembl hg using tophat2, I have the following idea:

  • Split the fastq files into small files of 10M
  • Map the small files separately to hg and generate .bam files
  • Merge the generated .bam file into one huge .bam file

I have no experience about the results of this method, this is what I ask for the help from experienced persons that have already performed this kind of method. In other word, does this method give the same result as we perform the mapping without splitting.

any help, advice, or suggestion ?

RNA-Seq tophat2 fatsq files splitting • 2.3k views
ADD COMMENT
0
Entering edit mode

I do not know if it makes any difference given that tophat2 could be run on multi threads (--num-threads). If you are using the GTF file with tophat, it creates a transcriptome for every alignment. Instead you could create a transcriptome fasta before hand and provide it to tophat (--transcriptome-index) to reduce the time.

It would be better to use STAR, as mentioned below, unless you have any specific concerns with it.

ADD REPLY
1
Entering edit mode
7.9 years ago
tiago211287 ★ 1.4k

You would get the same result. But if you are trying to accelerate the alignment I suggest you to use the STAR Aligner which will map 40 M reads in 20-30 Minutes, besides that, the alignment is better. The drawback is that you need a machine with a lot of RAM ( About 64 GB minimum).

In addition, if you still need use Tophat2 you would need to run the pieces of 10 M reads in parallel in order to really get faster results. If you try to align in sequence, it would delay even more.

ADD COMMENT
0
Entering edit mode

Thank you tiago,

With the splitting it take 4 hours to accomplish the mapping with tophat2.

ADD REPLY

Login before adding your answer.

Traffic: 2669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6