This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.

Balancing performance and environmental efficiency: a multiclass classification study of textual data

Priola, Maria Paola;Romano, Maurizio
2025-01-01

Abstract

This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.
2025
Statistical Learning; Multiclass Classification; One-vs-One; One-vs-All; Supervised Learning; Green AI
File in questo prodotto:
File Dimensione Formato  
_3__Balancing_performance_and_environmental_efficiency__a_multiclass_classification_study_of_textual_data.pdf

embargo fino al 31/12/2025

Tipologia: versione editoriale (VoR)
Dimensione 903.2 kB
Formato Adobe PDF
903.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/448369
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact