Sorry guys, I know this question has been asked before and I've read the prior material and I'm still a little unsure.
What do we consider biologic replicates in the context of human blood samples for RNA-seq? Project question: are there differentially expressed miRNA associated with event x? The main populations of interest being those who developed event x and those who did not.
For the sake of clarity, I'm going to be using the number 3 as the chosen number of replicates. I know more can be better, but 3 seems to be widely acceptable as adequate.
I think between those 2 groups, if I had 3 human blood samples who had event, and 3 samples who did not, they would be considered biologic replicates.
However, say I want to adjust for other factors--age, sex, stage of disease--would I only have biologic replicates if I had say 3 samples with event, with the the same age, sex, stage of disease? As opposed to having say 10 samples with event, but differing age, sex, stage of disease?
I think the answer is yes, for biologic replicates I need the same conditions both in outcome and adjusted variables but I would love to hear your thoughts.
The other question I've struggled with is what do we consider 3 samples of blood/plasma from the same patient drawn at the same time? I think arguably they're technical replicates, but sometimes people talk about three separate biopsy sites from an organ as biologic replicates which suggests these could be as well.
Thanks for the help in advance.
Thanks! That's along the lines of what I was thinking for blood but it's nice to have another opinion.
My populations are from a matched case-control group so they're artificially similar. I've read that ideally even with case-control matching, you should adjust for the matched variables because the matching process introduces a bias into your sample group, but there's only so much I can do with a small sample size.
And I agree, I certainly don't have enough samples for proper adjusting, as much as I would like to. Are there any rules of thumb out there for number of adjustable covariates? A lot of the packages like DESeq2 and edgeR are based on general linear models, and for regular linear regression, I've heard 20 samples or data per covariate. Would that be a reasonable application here? It's a higher number than I would like given how expensive RNA-seq still is, despite the cost reduction over time.
It's hard to say and really depends on how strong an effect you expect to see and how sensitive you want the analysis to be. You need at least a handful (at least 2 at absolute minimum) of each covariate group to reliably adjust for them. Determining sample number can be tough. If you only want the big movers, 3-5 samples per group may yield what you desire. If you want to see genes with a robust, yet modest change in expression, they may well be lost. 20 samples would obviously go a lot further, but budget is a concern for everyone (plenty of consortia have probably wanted 2000 samples when they only had 1000, etc). Do what you can afford, verify with qPCR when possible.
You have a better understanding of the experimental setup and event effect than any of us, so we can't really answer that part for you. If you have a general idea of the effect sizes you expect to see and the sensitivity you want, you can try doing power calculations to see how many samples would be necessary to capture x% of events.