UNICA IRIS Institutional Research Information System

Model interpretability is essential in machine learning, particularly for applications in critical fields like healthcare, where understanding model decisions is paramount. While SHAP (SHapley Additive exPlanations) has proven to be a robust tool for explaining machine learning predictions, its high computational cost limits its practicality for real-time use. To address this, we introduce C-SHAP (Clustering-Boosted SHAP), a hybrid method that combines SHAP with K-means clustering to reduce execution times significantly while preserving interpretability. C-SHAP excels across various datasets and machine learning methods, matching SHAP’s accuracy in selected features while maintaining an accuracy of 0.73 for Random Forest with substantially faster performance. Notably, in the Diabetes dataset collected by the National Institute of Diabetes and Digestive and Kidney Diseases, C-SHAP reduces the execution time from nearly 2000 s to just 0.21 s, underscoring its potential for scalable, efficient interpretability in time-sensitive applications. Such advancements in interpretability and efficiency may hold value for enhancing decision-making within software-intensive systems, aligning with evolving engineering approaches.

C-SHAP: A Hybrid Method for Fast and Efficient Interpretability

Golshid Ranjbaran;Diego Reforgiato Recupero;Chanchal K. Roy;Kevin A. Schneider

2025-01-01

Abstract

Model interpretability is essential in machine learning, particularly for applications in critical fields like healthcare, where understanding model decisions is paramount. While SHAP (SHapley Additive exPlanations) has proven to be a robust tool for explaining machine learning predictions, its high computational cost limits its practicality for real-time use. To address this, we introduce C-SHAP (Clustering-Boosted SHAP), a hybrid method that combines SHAP with K-means clustering to reduce execution times significantly while preserving interpretability. C-SHAP excels across various datasets and machine learning methods, matching SHAP’s accuracy in selected features while maintaining an accuracy of 0.73 for Random Forest with substantially faster performance. Notably, in the Diabetes dataset collected by the National Institute of Diabetes and Digestive and Kidney Diseases, C-SHAP reduces the execution time from nearly 2000 s to just 0.21 s, underscoring its potential for scalable, efficient interpretability in time-sensitive applications. Such advancements in interpretability and efficiency may hold value for enhancing decision-making within software-intensive systems, aligning with evolving engineering approaches.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Parole chiave
	
				interpretability in machine learning; interpretability; SHAP; LIME; C-SHAP;
computational efficiency; K-means clustering; software engineering
			
	Tipologia:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
applsci-15-00672-v2.pdf accesso aperto Descrizione: articolo online Tipologia: versione editoriale (VoR) Dimensione 706.78 kB Formato Adobe PDF Visualizza/Apri	706.78 kB	Adobe PDF	Visualizza/Apri

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/480166

Citazioni

ND

31

21

ND

social impact