UNICA IRIS Institutional Research Information System

Semisupervised clustering extends standard clustering methods to the semisupervised setting, in some cases consideringsituations when clusters are associated with a given outcome variable that acts as a “noisy surrogate,” that is a good proxy of the unknown clustering structure. In this article, a novel approach to semisupervised clustering associated with an outcome variable named network-based semisupervised clustering (NeSSC) is introduced. It combines an initialization, a training and an agglomeration phase. In the initialization and training a matrix of pairwise affinity of the instances is estimated by a classifier. In the agglomeration phase the matrix of pairwise affinity is transformed into a complex network, in which a community detection algorithm searches the underlying community structure. Thus, a partition of the instances into clusters highly homogeneous in terms of the outcome is obtained. We consider a particular specification of NeSSC that uses classification or regression trees as classifiers and the Louvain, Label propagation and Walktrap as possible community detection algorithm. NeSSC’s stopping criterion and the choice of the optimal partition of the original data are also discussed. Several applications on both real and simulated data are presented to demonstrate the effectiveness of the proposed semisupervised clustering method and the benefits it provides in terms of improved interpretability of results with respect to three alternative semisupervised clustering methods.

Network-based semisupervised clustering

Frigau, L.^Methodology;Contu, G.;Mola, F.^Supervision;Conversano, C.^Methodology

2021-01-01

Abstract

Semisupervised clustering extends standard clustering methods to the semisupervised setting, in some cases consideringsituations when clusters are associated with a given outcome variable that acts as a “noisy surrogate,” that is a good proxy of the unknown clustering structure. In this article, a novel approach to semisupervised clustering associated with an outcome variable named network-based semisupervised clustering (NeSSC) is introduced. It combines an initialization, a training and an agglomeration phase. In the initialization and training a matrix of pairwise affinity of the instances is estimated by a classifier. In the agglomeration phase the matrix of pairwise affinity is transformed into a complex network, in which a community detection algorithm searches the underlying community structure. Thus, a partition of the instances into clusters highly homogeneous in terms of the outcome is obtained. We consider a particular specification of NeSSC that uses classification or regression trees as classifiers and the Louvain, Label propagation and Walktrap as possible community detection algorithm. NeSSC’s stopping criterion and the choice of the optimal partition of the original data are also discussed. Several applications on both real and simulated data are presented to demonstrate the effectiveness of the proposed semisupervised clustering method and the benefits it provides in terms of improved interpretability of results with respect to three alternative semisupervised clustering methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Parole chiave
	
				Community detection; complex networks; label propagation; Louvain; tree-based classifiers; Walktrap
			
	Tipologia:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
asmb.2618 (1).pdf Solo gestori archivio Tipologia: versione post-print (AAM) Dimensione 1.18 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.18 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/311843

Citazioni

ND

3

4

ND

social impact