Methods currently used for micro-array data classification aim to select a minimum subset of features, namely a predictor, that is necessary to construct a classifier of best accuracy. Although effective, they lack in facing the primary goal of domain experts that are interested in detecting different groups of biologically relevant markers. In this paper, we present and test a framework which aims to provide different subsets of relevant genes. It considers initial gene filtering to define a set of feature spaces each of ones is further refined by taking advantage from a genetic algorithm. Experiments show that the overall process results in a certain number of predictors with high classification accuracy. Compared to state-of-art feature selection algorithms, the proposed framework consistently generates better feature subsets and keeps improving the quality of selected subsets in terms of accuracy and size.

Knowledge Discovery in Gene Expression Data via Evolutionary Algorithms

CANNAS, LAURA MARIA;DESSI, NICOLETTA;PES, BARBARA
2011

Abstract

Methods currently used for micro-array data classification aim to select a minimum subset of features, namely a predictor, that is necessary to construct a classifier of best accuracy. Although effective, they lack in facing the primary goal of domain experts that are interested in detecting different groups of biologically relevant markers. In this paper, we present and test a framework which aims to provide different subsets of relevant genes. It considers initial gene filtering to define a set of feature spaces each of ones is further refined by taking advantage from a genetic algorithm. Experiments show that the overall process results in a certain number of predictors with high classification accuracy. Compared to state-of-art feature selection algorithms, the proposed framework consistently generates better feature subsets and keeps improving the quality of selected subsets in terms of accuracy and size.
978-0-7695-4486-1
Micro-array Data, Feature Selection, Genetic Algorithms, Support Vector Machines, K-Nearest Neighbor
File in questo prodotto:
File Dimensione Formato  
BIOKDD2011.pdf

non disponibili

Descrizione: Articolo principale
Tipologia: versione post-print
Dimensione 177.26 kB
Formato Adobe PDF
177.26 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11584/110008
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact