Calculating alignment/mapping time
1
0
Entering edit mode
5.5 years ago
kumars.sv • 0

I am trying to assemble a plant genome using AWS resources using velvet. Plant genome is huge (> 10 times human genome) and coverage is around 30 x. We are planning for de novo assembly with Velvet (threads enabled). I would like to know if there is a calculator that can provide approximate time taken for assembly, given the necessary details. For eg. if I furnish RAM, CPU (type of instance), genome size, approximate coverage, type of sequencing (PE or SE) and number of client nodes, it should give me the number of hours or days that would take for assembly (reference and/or de novo).

denovo assembly mapping calculation NGS velvet • 1.3k views
ADD COMMENT
0
Entering edit mode

Since there was no reply, I have reposted this on SO here: https://stackoverflow.com/questions/53314743/calculating-alignment-mapping-time

ADD REPLY
0
Entering edit mode

Well, that question got downvoted and hmon suggested to move the query to bioinformatics.stackexchange. Moved it bioinformatics stack exchange. ty hmon

ADD REPLY
1
Entering edit mode
5.4 years ago
h.mon 35k

There is no simple answer to your question. There are other factors that influence time and memory use, like ploidy of the genome, heterozygosity, repetitive elements content, quality of the reads, among others. I will illustrate with two examples:

  • in one case, I assembled four bacterial genomes with SPAdes, all different strains from the same genus, similar sequencing coverage (100x) for all. Three of them finished in 2-4 of hours, the last one took more than a day. The culprit was library insert size and sequencing quality, which were worst than the other three.

  • a second case of two insect genomes, sister species, similar genome sizes and coverage (20x). I assembled both with SGA, one took 3-4 days, the other took one month to complete. In this case, although the genomes were similar in size, one had more repeats than the other, and apparently this threw SGA off its tracks.

P.S.: Velvet is a good assembler and, at its time, it was among the best assemblers available. However, its development stopped and it has been surpassed by others, specially in terms of time and memory use.

ADD COMMENT
0
Entering edit mode

I understand that it is tricky to calculate the estimated time for assembly and there are several factors that influence the assembly. It seems there is no such tool. For your PS point, what would be the suggested assembler, in terms of memory management and resources for polyploid genomes? Btw, thanks for your time. From recent paper (2018) on assemblers show velvet and abyss are better assemblers for eukaryotic genomes (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5826002/ - A Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective).

ADD REPLY

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6