Question: how to parallelize single function in R?
0
Entering edit mode

My question has two parts:

1) I want to run my function in parallel regardless of the code inside, is possible?e.g.

data("iris")
x.train <- iris[1:100,1:4]
y.train <- iris[1:100,5]
x.test <- iris[101:150,1:4]
y.test <- iris[101:150,5]

myfun<- function(x.train,y.train,x.test,y.test){
  library("e1071")
  model1 <- svm(x.train,y.train,type="c-classification")
  predc <<- predict(model,x.test)
  model2 <- svm(x.train,y.train,type="nu-classification")
  prednu <<- predict(model,x.test)   }

I want to parallelize this part:

myfun(x.train,y.train,x.test,y.test)  

2) I also want to run the above function multiple times:

for i=1:10
  myfun(x.train,y.train,x.test,y.test)  

Can you tell me how can I do these two parts in parallel in R?

PS: My original data is immense genome reads and I run over 10 classifiers, I really need do it in parallel.

ADD COMMENTlink 4.6 years ago na.cna30 • 0 • updated 4.6 years ago andrew.j.skelton73 5.7k
Entering edit mode
3

This question is more adequate for StackOverflow since it is about R programming. Also, the code you have there is no way near enough to provide you with an answer. For instance the for loop syntax is not even R, and the function returns nothing - as far as I can tell. An internet search returns plenty of tutorials that should help to get started with parallelizing R functions. 1, 2, 3, 4. Good luck.

ADD REPLYlink 4.6 years ago
A. Domingues
♦ 2.1k
Entering edit mode
0

Thanks for your information. the function pass argument to the workspace by <<- sign.

ADD REPLYlink 4.6 years ago
na.cna30
• 0
Entering edit mode
1

And you should post first few lines of your original data. No matter how large the data, you can always do a head on it and paste a few lines. What is the role of i other than running the same code 10 times over?

ADD REPLYlink 4.6 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

Cool, I did not know about "<<-". Learned something today.

ADD REPLYlink 4.6 years ago
A. Domingues
♦ 2.1k
Entering edit mode
4

Because it's bad practice and unsafe to use the 'global assignment' operator. Parallel (or even looped) calls to this function will overwrite each other's result. The function needs to be rewritten with a normal return value.

ADD REPLYlink 4.6 years ago
karl.stamm
3.5k
Entering edit mode
0

Yes, I read the linked entry from Advanced R, and it looks like something one should not use unless really needed. Well, the first time I saw it was in an Advanced R book, and that tells it all :)

ADD REPLYlink 4.6 years ago
A. Domingues
♦ 2.1k
Entering edit mode
0

haha, perhaps learned another way of R allowing one to shoot themselves in the foot.

ADD REPLYlink 4.6 years ago
Istvan Albert
80k
4
Entering edit mode

Do you know the R-package parallel https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf ?

A parallel version of apply and friends is a good example for the class of problems that can be easily parallelized ("embarrassingly parallel"). The problem can be broken down into fully independent steps, like aligning N fastq sequences or applying a function to rows of a matrix. All functions compatible with apply can be used like this. Other problems are more difficult, if they need to synchronize at one point (e.g. k-means).

ADD COMMENTlink 4.6 years ago Michael Dondrup 46k
Entering edit mode
1

I've had good experiences with the parLapply function of the parallel package. For a single desktop with 8 cores, it's easy to take apart a MC simulation, feed the dataset toward 8 worker nodes, and let them all have at it. I don't have to specify workload distribution, because parLapply gives each iteration to another node for me.

ADD REPLYlink 4.6 years ago
karl.stamm
3.5k
Entering edit mode
0

So i need to break down my function into 2 parts, each including one SVM operation. right?

ADD REPLYlink 4.6 years ago
na.cna30
• 0
Entering edit mode
0

No, I don't think so. The two svm steps need to be synchronized, because you can use the svm to predict only after it has finished training, or did you mean the two different svm's trained in one run? That you could do.

Also, you need to convert your input data into a nested list because the functions in package parallel work on lists only, there parallel apply functions in package snow, but I think this package needs a cluster of some sort.

ADD REPLYlink 4.6 years ago
Michael Dondrup
46k

Login before adding your answer.

Powered by the version 1.8