randomForestSRC multivariate regression analysis for differential gene expression (RNAseq) and phys meas.
0
0
Entering edit mode
4.1 years ago
gmchaput ▴ 10

I have a data set of significant differentially expressed genes (1028) from my DESeq2 analysis. I also have 5 measurements of physiology for my organism of interest. I have a total of 35 samples.

I ran a random forest analysis using rfsrc() from package, randomForestSRC. My y/response variables are the phys measurements (3 numeric, 2 categorical) whereas my x-variables are the genes (1028 numeric). I have an output but I am struggling in how to interpret my train dataset output and my test dataset output as well as how to visualize a tree from the forest.

I tried ggRandomForest but it appears that this is not set up for the multivariate (regr+) of randomForestSRC.

Basically, I want to know:

1) How to know if my model is correct?

2) How to determine which genes were the best predictors for the x-variables.

3) How to visualize the decision tree of the forest in order to see how the terminal nodes were decided.

I've reviewed Udaya Kogalur & Hemant Ishwaran's webpage (https://kogalur.github.io/randomForestSRC/theory.html) as well as other websites/forums but am still having trouble understanding how to proceed.

My summaries for the training set (80% of dataset) and test set (20% of dataset) are below:

        > print(RFmodel)
                         Sample size: 28
                     Number of trees: 1000
           Forest terminal node size: 3
       Average no. of terminal nodes: 5.68
No. of variables tried at each split: 33
              Total no. of variables: 1028
              Total no. of responses: 5
         User has requested response: Biomass.z
       Resampling used to grow trees: swor
    Resample size used to grow trees: 18
                            Analysis: mRF-RC
                              Family: mix+
                      Splitting rule: mv.mix *random*
       Number of random split points: 10
                % variance explained: -0.23
                          Error rate: 0.71

> print(RFpred)
  Sample size of test (predict) data: 7
                Number of grow trees: 1000
  Average no. of grow terminal nodes: 5.68
         Total no. of grow variables: 1028
         Total no. of grow responses: 5
         User has requested response: Biomass.z
       Resampling used to grow trees: swor
    Resample size used to grow trees: 4
                            Analysis: mRF-RC
                              Family: mix+
                % variance explained: 15.77
                 Test set error rate: 2.84
random forest gene expression multivariate • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6