Model interpretability is essential in machine learning, particularly for applications in critical fields like healthcare, where understanding model decisions is paramount. While SHAP (SHapley Additive exPlanations) has proven to be a robust tool for explaining machine learning predictions, its high computational cost limits its practicality for real-time use. To address this, we introduce C-SHAP (Clustering-Boosted SHAP), a hybrid method that combines SHAP with K-means clustering to reduce execution times significantly while preserving interpretability. C-SHAP excels across various datasets and machine learning methods, matching SHAP’s accuracy in selected features while maintaining an accuracy of 0.73 for Random Forest with substantially faster performance. Notably, in the Diabetes dataset collected by the National Institute of Diabetes and Digestive and Kidney Diseases, C-SHAP reduces the execution time from nearly 2000 s to just 0.21 s, underscoring its potential for scalable, efficient interpretability in time-sensitive applications. Such advancements in interpretability and efficiency may hold value for enhancing decision-making within software-intensive systems, aligning with evolving engineering approaches.

C-SHAP: A Hybrid Method for Fast and Efficient Interpretability

reforgiato recupero diego
;
2025-01-01

Abstract

Model interpretability is essential in machine learning, particularly for applications in critical fields like healthcare, where understanding model decisions is paramount. While SHAP (SHapley Additive exPlanations) has proven to be a robust tool for explaining machine learning predictions, its high computational cost limits its practicality for real-time use. To address this, we introduce C-SHAP (Clustering-Boosted SHAP), a hybrid method that combines SHAP with K-means clustering to reduce execution times significantly while preserving interpretability. C-SHAP excels across various datasets and machine learning methods, matching SHAP’s accuracy in selected features while maintaining an accuracy of 0.73 for Random Forest with substantially faster performance. Notably, in the Diabetes dataset collected by the National Institute of Diabetes and Digestive and Kidney Diseases, C-SHAP reduces the execution time from nearly 2000 s to just 0.21 s, underscoring its potential for scalable, efficient interpretability in time-sensitive applications. Such advancements in interpretability and efficiency may hold value for enhancing decision-making within software-intensive systems, aligning with evolving engineering approaches.
2025
C-SHAP
computational efficiency
interpretability
interpretability in machine learning
K-means clustering
LIME
SHAP
software engineering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/480166
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 21
social impact