I have a large global recarray totaling 30GBs of data in a programme running via qsub on a cluster with 256GBs of RAM. I am currently the only user on this cluster so there are no conflicts with the allocation of RAM. When looping over this recarray the system appears to shunt the object to the disc, not keep it held in RAM, thus slowing the loop in excess of 5 fold. I have been looking at using mmap on the object in the following ways and received the following errors.
m = mmap.mmap(myrecarray, 0)
MMAP TypeError: only length-1 arrays can be converted to Python scalars
m = mmap.mmap(myrecarray.fileno(), 0)
AttributeError: record array has no attribute fileno
is it possible to use mmap to hold a recarray object in the RAM, or is this a total misuse of the mmap method or can this only be done for other object types, like strings or files?
many thanks
This is a hardcore programming question, definitely there are python cracks around here, but I think you are better served with asking this on stackoverflow. Also, if you want to keep this question open here, please construct a plausible sounding ;) connection to bioinformatics.
Thanks for your response. I have already asked it on the stack, but so far received no response, so i thought I'd try Biostars as we are often dealing with large amounts of data, so maybe someone had previously come across a similar issue. To relate it to biology, this recarray is a local genome build, to which I am mapping millions of mutations and carrying out quantitative genetic analyses. I hope that clears things up.
So here is the link to the cross post: http://stackoverflow.com/questions/21637414/is-it-possible-to-mmap-a-recarray-in-python-2-7 It is always a good idea to provide this information from the beginning.
Thanks Michael, hopefully someone out there has some experience with locking objects into RAM.
afaik, the mmap POSIX system call can map files and devices to memory, but then they are just a pointer to a vector of bytes, nothing high level like a python data structure.