gravatar for gmchaput

2 hours ago by

USA

I have a data set of significant differentially expressed genes (1028) from my DESeq2 analysis. I also have 5 measurements of physiology for my organism of interest. I have a total of 35 samples.

I ran a random forest analysis using rfsrc() from package, randomForestSRC. My y/response variables are the phys measurements (3 numeric, 2 categorical) whereas my x-variables are the genes (1028 numeric). I have an output but I am struggling in how to interpret my train dataset output and my test dataset output as well as how to visualize a tree from the forest.

I tried ggRandomForest but it appears that this is not set up for the multivariate (regr+) of randomForestSRC.

Basically, I want to know:

1) How to know if my model is correct?

2) How to determine which genes were the best predictors for the x-variables.

3) How to visualize the decision tree of the forest in order to see how the terminal nodes were decided.

I've reviewed Udaya Kogalur & Hemant Ishwaran's webpage (kogalur.github.io/randomForestSRC/theory.html) as well as other websites/forums but am still having trouble understanding how to proceed.

My summaries for the training set (80% of dataset) and test set (20% of dataset) are below:

        > print(RFmodel)
                         Sample size: 28
                     Number of trees: 1000
           Forest terminal node size: 3
       Average no. of terminal nodes: 5.68
No. of variables tried at each split: 33
              Total no. of variables: 1028
              Total no. of responses: 5
         User has requested response: Biomass.z
       Resampling used to grow trees: swor
    Resample size used to grow trees: 18
                            Analysis: mRF-RC
                              Family: mix+
                      Splitting rule: mv.mix *random*
       Number of random split points: 10
                % variance explained: -0.23
                          Error rate: 0.71

> print(RFpred)
  Sample size of test (predict) data: 7
                Number of grow trees: 1000
  Average no. of grow terminal nodes: 5.68
         Total no. of grow variables: 1028
         Total no. of grow responses: 5
         User has requested response: Biomass.z
       Resampling used to grow trees: swor
    Resample size used to grow trees: 4
                            Analysis: mRF-RC
                              Family: mix+
                % variance explained: 15.77
                 Test set error rate: 2.84

link

modified 1 hour ago

written
2 hours ago
by

gmchaput10



Source link