Biostar Beta. Not for public use.
Question: When is better to pool samples?
0
Entering edit mode

Hi All,

I was thinking about a pipeline to build but I have a doubt on how to set a step in it. I'll try to keep the concept as broad as possible because I am also interested in the math behind it and other applications.

I have lot of samples which have to go through the same process and, in the end, I want to pool all the samples together. What is the best way: a. pool them and send them through the process b. send them through the process and pool the results

what are the benefits and drawbacks of each?

Thank you

ADD COMMENTlink 21 months ago ste.lu • 40 • updated 19 months ago Biostar 20
Entering edit mode
1

Without knowing full details of what you are trying to do b would be better since you are parallelizing your processing and can go through everything faster than a.

ADD REPLYlink 21 months ago
genomax
68k
Entering edit mode
0

Completely agree with your answer in terms of computational power. But what about the Math/statistics behind it, are they exactly the same thing? It always depends on the task I am talking about?

ADD REPLYlink 21 months ago
ste.lu
• 40
Entering edit mode
1

It always depends on the task I am talking about?

Likely. If operations you are doing are independent of each other (e.g. splitting a billion sequence file into 100 chunks and starting 100 alignments against the same reference genome as opposed to one alignment job) then doing b would always be preferable/faster (as long as you have resources available) as the resulting alignments can be merged later. But if an operation is dependent on content (e.g. an assembly job where a sample was sequenced on multiple lanes/flowcells) then pooling the data would be required before starting the job to avoid biases.

I can't comment on the theoretical implications of a vs b but someone else may do that.

ADD REPLYlink 21 months ago
genomax
68k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0