building large dataset transcriptome
0
0
Entering edit mode
8 weeks ago
Raygozak ★ 1.4k

Hi, I'm trying to build a de novo transcriptome from quite a large dataset with multiple conditions for the same organism.

I tried building it using spades for the entire dataset but it just runs out of memory at some point.

I have resorted to building the transcriptome for one replicate and then using the result as trusted-contigs in the next sample assembly and so on but it is going too slow. has anyone done this? does it get faster with more samples as you would use the trusted contigs as some sort of starting point

Or, is there are way to build many small transcriptomes and then merge them or would still amount to the same?

would pre-merging paired reads help?

I'd appreciate any ideas you might have to do this.

Thanks

de-novo transcriptome • 263 views
ADD COMMENT
1
Entering edit mode

Or, is there are way to build many small transcriptomes and then merge them or would still amount to the same?

Intuitively it should be fine to build smaller sets (since you don't seem to have the infrastructure to build a single large one) and then use something to remove redundancy (CD-HIT or clumpify from BBMap). It may be a little tricky if you are working with eukaryotic data but at least remove sequences that are fully identical over the entire length.

would pre-merging paired reads help?

You reads should not normally be merging (unless you have short inserts).

ADD REPLY

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6