I'm trying to calculate expression levels for a set of about 250 aligned SAM files (about 2gb each). I'm planning to write a script to run cufflinks on all of these, then write a script to merge all of these transcripts.gtf cufflinks output files. I was wondering how long this will take, and whether it will crash the machine? (12 cores, 16gb memory).
Also, I want to do CuffDiff for all 250 of these as well.. would this much data cause crash?
I would think that a 12 core machine should have at least has 24 GB RAM, 2GB/core, this looks little weird. The best answer would be try running the job and if you have SGE (sun grid engine) then you can specify a parameter -l -h_vmem=15G which means the system will kill your job, if it exceeds 15GB of RAM. Make a shell script to run cufflinks and merge on user inputted argument and try to release 4 jobs (1 per file) and in the cufflinks, give -p 4. By this, the system will be efficiently used and each file will use four cores.
On the level of automation, make another script to check the output of qstat, which tells you about the job status. The easiest would be using if statement on qstat output (skip header as 2 lines), so qstat | wc -l if this is <6, then schedule another job, till the last file.
For me the cufflinks took on an average, 1 and a half hour on 8 cores/file. So, in your case I would assume 2.5-3.5 hours/file.