Hi all,
I am trying to design an experiment and could use some input. My goal is to generate a predictive gene signature, which can separate two groups BUT I have a number of confounding variables. My starting point are RNA-seq experiments.
- compute gene expression from RNA-seq experiments (lower number of samples)
- build model based on RNA-seq training data (Elastic Net regression)
- test the model with the test data
=> pre-select a gene panel
- use a targeted method e.g. qPCR or NanoString and get gene expression for the gene panel (high number of samples)
- build model based on training data for the targeted method (Elastic Net regression)
- test the model with the test data
=> final panel
Does this make sense? Or should I better use just RNA-seq experiments but with a higher number of samples. Or would it make sense to compute the DEG with RNA-seq, select the top genes (or pre-select somehow) and then do the model prediction just with the targeted method?
Can anyone also in general advise on sample size requirements for marker predictions?
Thank you so much!