manage huge arrays in R
2
0
Entering edit mode
8.1 years ago

Hi all, working in R, I need to merge 24 big arrays (on average 2.5 millions of points each array and stored as RData) in order to compute overall statistics as mean, median and percentiles. Uploading all into memory is not feasible, thus I was wondering if you can suggest me some strategy to face this problem. I read about ff package but I cannot find usage examples that fit my problem.

Thank you for every hints or suggestions.

R ff memory usage • 2.7k views
ADD COMMENT
0
Entering edit mode

Hi Nicola,

could you please make the connection to bioinformatics explicit? I can guess there are a million reason why a bioinformatician would need those, but sort of have this requirement on this site. Otherwise you might find the solution on Stackoverflow already.

ADD REPLY
4
Entering edit mode
8.1 years ago
Ahill ★ 1.9k

If memory limitation is the issue for your 24*2.5 million points, you may want to take a look at the HDF5 array package. The HDF5Matrix object in particular is advertised to support standard matrix operations like rowSums, colSums over large on-disk matrices.

ADD COMMENT
1
Entering edit mode
8.1 years ago

There's the bigmemory package. You could also read the data in chunks (as in the mainframe days). There are algorithms to compute running statistics. Alternatively, do the work in a memory-efficient language like C.

ADD COMMENT

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6