

Files for reproducing the analyses presented in the paper

"An AUC-based Permutation Variable Importance Measure for Random Forests"

by S. Janitza, C. Strobl and A.-L. Boulesteix



Silke Janitza

14.04.2019





------------------------------------------------------------------------------------------------------------------------



IMPORTANT:
----------


- Unzip the file reproducible_files.zip. Before running an R file set working directory where the file AUC_VIM is stored.

- Make sure that the R packages party 1.0-6 and ROCR 1.0-4 are installed (and randomForest 4.6-7 to produce Figure 2).


For exactly reproducing the "Simulated_Data" results, please see the technical requirements (from output sessionInfo()) including R package versions.

  sessionInfo()
  R version 2.15.1 (2012-06-22)
  Platform: x86_64-pc-mingw32/x64 (64-bit)

  locale:
  [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
  [5] LC_TIME=German_Germany.1252

  attached base packages:
  [1] stats4 grid splines stats graphics grDevices utils datasets methods base

  other attached packages:
  [1] ROCR_1.0-4 gplots_2.11.0 KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0 gtools_2.7.0 party_1.0-6
  [8] vcd_1.2-13 colorspace_1.2-1 MASS_7.3-18 strucchange_1.4-7 sandwich_2.2-9 zoo_1.7-9 coin_1.0-21
  [15] mvtnorm_0.9-9994 modeltools_0.2-19 survival_2.36-14

  loaded via a namespace (and not attached):
  [1] bitops_1.0-5 lattice_0.20-6 tools_2.15.1
  



-------------------------------------------------------------------------------------------------------------------------





INSTRUCTIONS FOR REPRODUCING THE RESULTS OF THE COMPARISON STUDIES



REPRODUCING FIGURES:

- to produce a figure run the code of the file that is named according to the figure you want to reproduce. The resulting figures are saved in the folder "Results". Note that codes which produce figures load the already computed variable importances  stored in the folder "R_Objects". To also reproduce the variable importances see 'REPRODUCING VARIABLE IMPORTANCES'.





REPRODUCING VARIABLE IMPORTANCES:

- to compute the variable importances run the file "VI_computation". Note that running these files could last several hours/days. The resulting VIs are stored in the folder "R_Objects".





REPRODUCING SIGNIFICANCE TESTING RESULTS:

- for some analyses R files are available to test for the differences in performance (measured by the AUC) between the variable importance measures. These files are named "significance". The results are stored in the folder "Results" and are named "Performance_differences_p_values". These files show the difference in AUCs for all iterations with the corresponding p-values testing if the difference is equal to zero. Note that codes which compute p-values load the already computed variable importances stored in the folder "R_Objects". To also reproduce the variable importances see 'REPRODUCING VARIABLE IMPORTANCES'.



