Question

Doubts using Velvet (transcriptome assembly) mainly in performance

0

Entering edit mode

6.4 years ago

pablo61991 ▴ 90

Hello community

I'm running Velvet right now and I have several doubts because I'm in memory issues. I have tried to solve that but I wasn't successful.

I'm novice using this software (and I'm not bioinformatician). To contextualize, I'm following the protocol uploaded here. I found it pretty good and easy to follow but I never have use before Velvet.

This software has 2 steps. First Velveth, which use different k-mer size to generate the graphs (sorry if I have understand this in a wrong way) and Velveg, which generate the different contigs.

velveth dir 21,37,2 -shortPaired -fastq -separate ALL_R1.fq ALL_R2.fq 1>velveth_21-35.log 2>velveth_21-35.err

When I have a look at the log file I can see several runs over some k-mer size because I restart the run several times (for mistake). I thought the software can recover the point where it stops in a previous run but now I think it just rerun the velveth step with the same k-mer size and add-it to the first-ones. So, my first question is that, is velveth accumulating more and more info in the output generated for example at k-mer size 21 or each new run delete the prev output.

So, should I delete the output folder (as example) dir_21 which contains several runs in their .log or the output corresponds only to the last run? I figure out this because I have a very large files: 8Gb PreGraph, 46Gb Sequences, 30Gb Roadmaps (I put the numbers because I don't know what is considered "normal"), but maybe it's common to obtain that.

Then I run:

velvetg dir_21 -read_trkg yes -ins_length 240 1>dir_21.log 2>dir_21.err

And here appears the warning about problems allocating data.

When I did the same with a larger k-mer, like 81 I can run both velveth and velvetg without warnings, and I have in my output folder the contig.fa which is supposed to be generated.

Example of the message after a successful run:

[5510.628647] Concatenation over! 
[5511.696744] Writing contigs into > dir_71/contigs.fa... 
[5556.542090] Writing into stats file > dir_71/stats.txt... 
[5588.526595] Writing into graph file > dir_71/LastGraph... 
Final graph has 3304746 nodes and n50 of 186, max 8448, total 261280561, using 129042549/239912342 reads

The other question is related with resource usage. I understand the difficult here but I'm running several works and I want to know how much resources I'm using in total because maybe I'm asking for too much cpus or too much RAM.

Sorry if the question aren't clear enough, I'll correct them in you consider that necessary...

my English is not so good also =(

Thank you for your time

Pablo

velvet assembly transcriptome de novo RNA-Seq • 2.0k views

ADD COMMENT • link updated 6.4 years ago by GenoMax 141k • written 6.4 years ago by pablo61991 ▴ 90

0

Entering edit mode

Something wrong in my way to question or just a complicated problem?

ADD REPLY • link 6.4 years ago by pablo61991 ▴ 90