
****************************************************** REPRODUCIBLE FILES TO THE PAPER *******************************************************

"A computationally fast variable importance test for random forests for high-dimensional data" (2015)



by S. Janitza, E. Celik and A.-L. Boulesteix






**********************************************************************************************************************************************

All codes were implemented by S. Janitza (except for the R package vita). If you have any questions regarding these codes, please contact 
S. Janitza
 <janitza@ibe.med.uni-muenchen.de>. For questions regarding the R package vita contact E. Celik <celik.p.ender@gmail.com>.






********* C O N T E N T S *********



The reproducible file contains:
  
  * text file 'README':    contains detailed information and instructions to reproduce the analysis presented in the paper
  * folder 'codes':        contains all R-codes for reproducing the results presented in the paper and the appendix
  * folder 'results':      relevant R objects are stored in this folder after running codes
                           (each R-file starting with "plot" loads objects contained in this folder)





********* N O T E S *********



R version 3.1.1 was used for all analyses. Platform: x86_64-unknown-linux-gnu (64-bit).


Before you start, please make sure the following packages are installed:
   * randomForest (version 4.6-10)
   * vita (version 0.1)
   * snowfall (version 1.84-6) 
   * snow (version 0.3-13) 
   * ROCR (version 1.0-5) 
   * golubEsets (version 1.6-0; Bioconductor; contains the leukemia data)  


The following packages are loaded via a namespace (and not attached):
   * bitops_1.0-6       
   * caTools_1.17.1    
   * gdata_2.13.3   
   * gtools_3.4.1      
   * KernSmooth_2.23-12
   * Rcpp_0.11.6 


********* D A T A   D O W N L O A D *********

The datasets should be downloaded before executing R files starting with 'compute':

   * The Prostate Cancer, Breast Cancer and Colon Cancer data used to reproduce our studies can be downloaded from 
     http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html ('Data files').
   * The Embryonal Tumor data can be downloaded from http://datam.i2r.a-star.edu.sg/datasets/krbd/NervousSystem/NervousSystem.html
   * The Leukemia Data is part of the R package golubEsets.


********* I N S T R U C T I O N S *********



1. 
Unzip the folder reproducible_files. Do not delete or move the files contained in it.
2. Set the working directory where the folder codes is stored. Make sure that all relevant R packages are installed (see above).

3. To reproduce the results download the data (see above) and run all R files starting with 'compute'. The created R objects are stored in 
   the folder results.
4. Run the R files mentioned in the following table. When running the codes figures are stored as pdf-files (Figs 3 and B.2 as tiff) in the 
   current working directory.


 ------------------------------------------------------------------------
 FIGURE                 R FILE                                               
 ------------------------------------------------------------------------

 2, B.1, B.9            plot_null_distribution.R       
                                                                         

 3, B.2                 plot_fold_specific_against_each_other.R                     


 4, B.3, B.12, B.13     plot_discriminative_power.R                      
                   

 5, B.10                plot_typeIerror.R


 6, 7, 8, B.11          plot_power.R
 B.4-B.8, B.14-B.18                                                        

