In this paper, we address the challenging task of learning accurate classifiers from micro-array datasets involving a large number of features but only a small number of samples. We present a greedy step-by-step procedure (SSFS) that can be used to reduce the dimensionality of the feature space. We apply the Minimum Description Length principle to the training data for weighting each feature and then select an “optimal” feature subset by a greedy approach tuned to a specific classifier. The Acute Lymphoblastic Leukemia dataset is used to evaluate the effectiveness of the SSFS procedure in conjunction with different state-of-the-art classification algorithms.
Learning Classifiers for High-Dimensional Micro-array Data
BOSIN, ANDREA;DESSI, NICOLETTA;PES, BARBARA
2006-01-01
Abstract
In this paper, we address the challenging task of learning accurate classifiers from micro-array datasets involving a large number of features but only a small number of samples. We present a greedy step-by-step procedure (SSFS) that can be used to reduce the dimensionality of the feature space. We apply the Minimum Description Length principle to the training data for weighting each feature and then select an “optimal” feature subset by a greedy approach tuned to a specific classifier. The Acute Lymphoblastic Leukemia dataset is used to evaluate the effectiveness of the SSFS procedure in conjunction with different state-of-the-art classification algorithms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.