Hello all,
I'm running SAVAGE (https://bitbucket.org/jbaaijens/savage/src/master/) to get the haplotypes of a big virus (almost 3x the HIV virus' size) and it took over 72h hours to get to stage b of the SAVAGE pipeline.
The sequencing has a coverage of 20000x and the reference genome has 32KB of size.
I'm using 7 threads to run this analysis on a 8 vCPUs, 52 GB RAM and 10T disk machine.
I also tested HaploClique (https://github.com/cbg-ethz/haploclique) and PredictHaplo (http://bmda.cs.unibas.ch/software.html) softwares on this analysis, but both also took over 72h and never finished.
Is there another software that runs the whole analysis in less than 72h per virus in the conditions/specifications I mentioned above?
Thank you all in advance for any tips or help you may give me,
Is the ultra high depth of sequencing causing this issue? Do you really need that much coverage.
I'm not sure if that much coverage is necessary, but it's not unusual for virus's haplotype reconstruction. You thing that 10000x is enough?
Someone else will have to comment on the re-construction part. You could try different amounts (starting with 1000x) and see if it makes a big difference in results as you go up.
That's a great idea! I will try that! Thank you.