| compare {CMA} | R Documentation |
Classifiers can be evaluated separately using the method
evaluation. Normally, several classifiers
are used for the same dataset and their performance is
compared. This comparison procedure is essentially facilitated by
this method.
For S4 method information, s. compare-methods
compare(clresultlist, measure = c("misclassification", "sensitivity",
"specificity", "average probability", "brier score", "auc"), aggfun =
meanrm, plot = FALSE, ...)
clresultlist |
A list of lists (!) of objects of class cloutput or clvarseloutput.
Each inner list is usually returned by classification.
Additionally, the different list elements of the outer list
should have been created by different classifiers, s.
also example below. |
measure |
A character vector containing one or more of the elements listed below.
By default, all measures are computed, using evaluation
with scheme = "iterationwise".
Note that "sensitivity", "specificity", "auc" cannot be computed
for the multiclass case.
|
aggfun |
Function that determines how performance among different iterations are aggregared.
Default is meanrm, which computes the mean using na.rm=T.
Other possible choices are quantiles. |
plot |
Should the performance of different classifiers be visualized by a joint boxplot ?
Default is FALSE. |
... |
Further arguments passed to boxplot in the case that plot = TRUE. |
A data.frame with rows corresponding to the compared classifiers
and columns to the performance measures, aggregated by aggfun, s. above.
If more than one measure is computed and plot = TRUE, one separate
plot is created for each of them.
Martin Slawski ms@cs.uni-sb.de
Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de
Christoph Bernau bernau@ibe.med.uni-muenchen.de
Dudoit, S., Fridlyand, J., Speed, T. P. (2002)
Comparison of discrimination methods for the classification of tumors
using gene expression data.
Journal of the American Statistical Association 97, 77-87
Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
## Not run:
### compare the performance of several discriminant analysis methods
### for the Khan dataset:
data(khan)
khanX <- as.matrix(khan[,-1])
khanY <- khan[,1]
set.seed(27611)
fiveCV10iter <- GenerateLearningsets(y=khanY, method = "CV", fold = 5, niter = 2, strat = TRUE)
### candidate methods: DLDA, LDA, QDA, pls_LDA, sclda
class_dlda <- classification(X = khanX, y=khanY, learningsets = fiveCV10iter, classifier = dldaCMA)
### peform GeneSlection for LDA, FDA, QDA (using F-Tests):
genesel_da <- GeneSelection(X=khanX, y=khanY, learningsets = fiveCV10iter, method = "f.test")
###
class_lda <- classification(X = khanX, y=khanY, learningsets = fiveCV10iter, classifier = ldaCMA, genesel= genesel_da, nbgene = 10)
class_qda <- classification(X = khanX, y=khanY, learningsets = fiveCV10iter, classifier = qdaCMA, genesel = genesel_da, nbgene = 2)
### We now make a comparison concerning the performance (sev. measures):
### first, collect in a list:
dalike <- list(class_dlda, class_lda, class_qda)
### use pre-defined compare function:
comparison <- compare(dalike, plot = TRUE, measure = c("misclassification", "brier score", "average probability"))
print(comparison)
## End(Not run)