GenerateLearningsets {CMA}R Documentation

Repeated Divisions into learn- and tets sets

Description

Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.

This function is usually the basis for all deeper analyses.

Usage

GenerateLearningsets(n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"),
                     fold = NULL, niter = NULL, ntrain = NULL, strat = FALSE)

Arguments

n The total number of observations in the available data set. May be missing if y is provided instead.
y A vector of class labels, either numeric or a factor. Must be given if strat=TRUE or n is not specified.
method Which kind of scheme should be used to generate divisions into learning sets and test sets ? Can be one of the following:
"LOOCV"
Leaving-One-Out Cross Validation.
"CV"
(Ordinary) Cross-Validation. Note that fold must as well be specified.
"MCCV"
Monte-Carlo Cross Validation, i.e. random divisions into learning sets with ntrain(s.below) observations and tests sets with ntrain observations.
"bootstrap"
Learning sets are generated by drawing ntrain times with replacement from all observations. Those not drawn not all form the test set.
fold Gives the number of CV-groups. Used only when method="CV"
niter Number of iterations (s.details).
ntrain Number of observations in the learning sets. Used only when method="MCCV" or method="bootstrap".
strat Logical. Should stratified sampling be performed, i.e. the proportion of observations from each class in the learning sets be the same as in the whole data set ?
Does not apply for method = "LOOCV".

Details

Value

An object of class learningsets

Author(s)

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

References

Braga-Neto, U.M., Dougherty, E.R. (2003).

Is cross-validation valid for small-sample microarray classification ?

Bioinformatics, 20(3), 374-380

Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).

Prediction error estimation: a comparison of resampling methods.

Bioinformatics, 21(15), 3301-3307

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

See Also

learningsets, GeneSelection, tune, classification

Examples

# LOOCV
loo <- GenerateLearningsets(n=40, method="LOOCV")
show(loo)
# five-fold-CV
CV5 <- GenerateLearningsets(n=40, method="CV", fold=5)
show(loo)
# MCCV
mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30)
show(mccv)
# Bootstrap
boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3)
# stratified five-fold-CV
set.seed(113)
classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2))
CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE)
show(CV5strat)

[Package CMA version 1.5.4 Index]