Entering edit mode
4.8 years ago
VDL
▴
10
I'm trying to use cd-hit to generate a 0.9 sequence identity cutoff of the Blast NR database.
Here's what I'm running:
cd-hit -i nr -o nr90 -c 0.9 -M 1000
But, even though I'm using the -M 1000
option, the command just gradually uses up all the available RAM (8gb) and then crashes. Any idea on how to fix this?
More RAM.
if you want to cluster all of
nr
, you’re going to need much more than 8gb.This strikes me as an XY problem though, what are you trying to achieve?
I'm trying to replicate a result for a protein prediction problem that used this database. My understanding was that the
-M
flag was supposed to limit the amount of RAM that the program used. So it doesn't work?The program needs to use at least a certain amount of RAM. You can’t make a program that needs X gb of RAM run on < X.
It’s probably hitting your limit of 1gb, and then crashing, RAM usage can be a somewhat complex thing to monitor. I’m not sure why it continues to use all 8gb when you specifiy 1, but regardless, NR is far, far too big to be done with even 8, I’d pretty much guarantee.
Maybe you want to use UniRef?
I'll probably use it if I can't get the other option to work, thanks for pointing that out.