Question

building large dataset transcriptome

0

Entering edit mode

8 weeks ago

Raygozak ★ 1.4k

Hi, I'm trying to build a de novo transcriptome from quite a large dataset with multiple conditions for the same organism.

I tried building it using spades for the entire dataset but it just runs out of memory at some point.

I have resorted to building the transcriptome for one replicate and then using the result as trusted-contigs in the next sample assembly and so on but it is going too slow. has anyone done this? does it get faster with more samples as you would use the trusted contigs as some sort of starting point

Or, is there are way to build many small transcriptomes and then merge them or would still amount to the same?

would pre-merging paired reads help?

I'd appreciate any ideas you might have to do this.

Thanks

de-novo transcriptome • 263 views

ADD COMMENT • link updated 8 weeks ago by GenoMax 141k • written 8 weeks ago by Raygozak ★ 1.4k

1

Entering edit mode

Or, is there are way to build many small transcriptomes and then merge them or would still amount to the same?

Intuitively it should be fine to build smaller sets (since you don't seem to have the infrastructure to build a single large one) and then use something to remove redundancy (CD-HIT or clumpify from BBMap). It may be a little tricky if you are working with eukaryotic data but at least remove sequences that are fully identical over the entire length.

would pre-merging paired reads help?

You reads should not normally be merging (unless you have short inserts).

ADD REPLY • link 8 weeks ago by GenoMax 141k