This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.
Balancing performance and environmental efficiency: a multiclass classification study of textual data
Priola, Maria Paola;Romano, Maurizio
2025-01-01
Abstract
This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.| File | Dimensione | Formato | |
|---|---|---|---|
|
_3__Balancing_performance_and_environmental_efficiency__a_multiclass_classification_study_of_textual_data.pdf
embargo fino al 31/12/2025
Tipologia:
versione editoriale (VoR)
Dimensione
903.2 kB
Formato
Adobe PDF
|
903.2 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


