Differential expression with leave one out?
1
0
Entering edit mode
5.9 years ago
Eugene A ▴ 180

Hi everyone, I have a general question concerning RNAseq analysis.

I'm running the analysis, comparing 17 vs 39 biological replicas and the goal is to identify biomarkers for these two group. In order to identify DE genes I'm using DESeq2. I filter the resulting list by Log2fold change >1,<-1 (genes will be used for qrtPCR, so the DE level have to be detectable by that method), by AUC >0.7 (to identify genes which separate two classes in the best way) and by TPM (to drop low expression genes).

After these filters I still have a list of 54 genes. Now I'd like to reduce that number in order to test them on a large sample set in qrtPCR experiment and build a proper classifier.

What I did so far was a "leave one out" DE analysis. Basically I run 68 (17+39) DE searches leaving out one sample each time. That results in the 68 lists of DE genes and 39 genes were in all of them (after described filtering). It seems to me that these 39 genes should the most robust DE genes. Is it true, or are there any internal problem with such approach? For example I know that there is a "minReplicatesForReplace" option in DESeq2 which seems to perform similar thing?

And the second question is: could further apply feature slection methods to that set of 39 genes? For example RFE-SVM?

Best, Eugene

RNA-Seq deseq2 • 1.3k views
ADD COMMENT
2
Entering edit mode
4.9 years ago

The biggest issue is that DESeq2 removes outlier samples by default when you have sufficient numbers of samples. That's not the same as leave one out, but it will mess with the results. You may need to disable that altogether to have a better idea of how robust things are. That will likely also further decrease the number of possible genes, though even 39 is a decently small list to work with.

ADD COMMENT
0
Entering edit mode

Thanks! I'll keep it in mind next time. Although I'd say that 39 is a not that small number when have to be validated in the lab -> my task was to reduce the list of genes as far as possible.

ADD REPLY
0
Entering edit mode

Inevitable a quarter of them will be uncharacterized, so you'll probably ignore them. Others will be more interesting given whatever you're working on, so the final list after reading through the literature will probably be closer to 12, which is pretty doable with qPCR.

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6