Question

how to parallelize single function in R?

0

Entering edit mode

8.9 years ago

na.cna30 • 0

My question has two parts:

1) I want to run my function in parallel regardless of the code inside, is possible?e.g.

data("iris")
x.train <- iris[1:100,1:4]
y.train <- iris[1:100,5]
x.test <- iris[101:150,1:4]
y.test <- iris[101:150,5]

myfun<- function(x.train,y.train,x.test,y.test) {
  library("e1071")
  model1 <- svm(x.train,y.train,type="c-classification")
  predc <<- predict(model,x.test)
  model2 <- svm(x.train,y.train,type="nu-classification")
  prednu <<- predict(model,x.test)
}

I want to parallelize this part:

myfun(x.train,y.train,x.test,y.test)

2) I also want to run the above function multiple times:

for i=1:10
  myfun(x.train,y.train,x.test,y.test)

Can you tell me how can I do these two parts in parallel in R?

PS: My original data is immense genome reads and I run over 10 classifiers, I really need do it in parallel.

R • 6.1k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by na.cna30 • 0

3

Entering edit mode

This question is more adequate for StackOverflow since it is about R programming. Also, the code you have there is no way near enough to provide you with an answer. For instance the for loop syntax is not even R, and the function returns nothing - as far as I can tell. An internet search returns plenty of tutorials that should help to get started with parallelizing R functions. 1, 2, 3, 4. Good luck.

ADD REPLY • link 8.9 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Thanks for your information. the function pass argument to the workspace by <<- sign.

ADD REPLY • link 8.9 years ago by na.cna30 • 0

1

Entering edit mode

And you should post first few lines of your original data. No matter how large the data, you can always do a head on it and paste a few lines. What is the role of i other than running the same code 10 times over?

ADD REPLY • link 8.9 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Cool, I did not know about "<<-". Learned something today.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by A. Domingues ★ 2.7k

4

Entering edit mode

Because it's bad practice and unsafe to use the 'global assignment' operator. Parallel (or even looped) calls to this function will overwrite each other's result. The function needs to be rewritten with a normal return value.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by karl.stamm 4.1k

0

Entering edit mode

Yes, I read the linked entry from Advanced R, and it looks like something one should not use unless really needed. Well, the first time I saw it was in an Advanced R book, and that tells it all :)

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

haha, perhaps learned another way of R allowing one to shoot themselves in the foot.

ADD REPLY • link 8.9 years ago by Istvan Albert 100k

6

Entering edit mode

8.9 years ago

andrew.j.skelton73 6.5k

I wrote this question in StackOverflow a while back.... I got some useful answers.

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by andrew.j.skelton73 6.5k

Ram · Accepted Answer · 2015-06-17

5

Entering edit mode

8.9 years ago

Michael 54k

Do you know the R-package parallel?

A parallel version of apply and friends is a good example for the class of problems that can be easily parallelized ("embarrassingly parallel"). The problem can be broken down into fully independent steps, like aligning N fastq sequences or applying a function to rows of a matrix. All functions compatible with apply can be used like this. Other problems are more difficult, if they need to synchronize at one point (e.g. k-means).

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by Michael 54k

1

Entering edit mode

I've had good experiences with the parLapply function of the parallel package. For a single desktop with 8 cores, it's easy to take apart a MC simulation, feed the dataset toward 8 worker nodes, and let them all have at it. I don't have to specify workload distribution, because parLapply gives each iteration to another node for me.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by karl.stamm 4.1k

0

Entering edit mode

So i need to break down my function into 2 parts, each including one SVM operation. right?

ADD REPLY • link 8.9 years ago by na.cna30 • 0

0

Entering edit mode

No, I don't think so. The two svm steps need to be synchronized, because you can use the svm to predict only after it has finished training, or did you mean the two different svm's trained in one run? That you could do.

Also, you need to convert your input data into a nested list because the functions in package parallel work on lists only, there parallel apply functions in package snow, but I think this package needs a cluster of some sort.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by Michael 54k