Performing fast bootstrap in R using ape package
0
0
Entering edit mode
6.0 years ago
User000 ▴ 690

Dear all,

I would like to perform a nj tree with 1000 bootstrap on my snp data. I have around 5K snps and I am using R package ape:

snp <- as.matrix(objt)
stree = nj(dist.gene(snp))
myBoots <- boot.phylo(stree, snp, function(xx) nj(dist.gene(xx)), B = 1000,  mc.cores = 6)

It has been 3 days so far it is still running, any suggestion how to make it faster, if it is possible at all.

R bootstrap ape • 6.2k views
ADD COMMENT
0
Entering edit mode

Are you sure that it is actually using the 6 cores that you specify? Is your parallel package loaded correctly?

Also, isn't 1000 bootstrap too much? 250x would be fine.

ADD REPLY
0
Entering edit mode

yeah, it says Running parallel bootstraps... and also is using 6 cores... Do you think it is enough for 5000 snps and something is going wrong?

ADD REPLY
1
Entering edit mode

Clustering is a data-intensive technique and doing it 1000 times for 5000 SNPs is going to take a long time, even with 6 cores.

Why not try it first with 6x bootstrap and 6 cores, and then see how long that takes (1 bootstrap per core). Then you will get an idea of timing.

I still believe that 1000x bootstrap is way too much.

ADD REPLY
0
Entering edit mode

I am running it also on a cluster with 10 cores (I don't know exactly how many cores I am allowed to use) and is still running also 3 days. Without bootstrap it takes me around 1-2 hours. Thank you a lot for the advice, I am now running it,let's see

ADD REPLY
1
Entering edit mode

Okay, I think that you may have just answered your own question. If it takes even 1 hour to just run it once (on a single core), then 1000 bootstrap across 10 cores will take ~100 hours, or just over 4 days. Time is precious! Make the most of it.

ADD REPLY
0
Entering edit mode

How many samples you have? I do almost the same thing in Phangorn package with 13 samples and 7.5K of SNPs. So it took just 1-2 minutes for 2000 bootstrap replics.

ADD REPLY
0
Entering edit mode

Hello, in the follow-up of the bootstrapping, how can I then draw my actual tree?

Does the boot.phylo function update the NJ saved in stree? In other words, will boot.phylo generate bootstrap trees and update the consensus tree in the variable stree? If so, then I could apply ggtree on it. Is this correct?

Or does the boot.phylo function allow me only to label the previously generated NJ tree? If this is the case, is there any alternative to generate a bootstrap consensus tree to be plotted later?

Thanks

ADD REPLY
0
Entering edit mode

Hey, you may consider opening a new question for this. User000 has not logged in for > 11 months.

ADD REPLY

Login before adding your answer.

Traffic: 2534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6