Several Multi-class text classification (MCC) strategies, namely One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC), are compared in terms of accuracy and computational efficiency. Each strategy is implemented utilizing several classifiers such as Naïve Bayes, Random Forest, Logistic Regression, Neural Networks, Linear Discriminant Analysis, Support Vector Machine, and the recently-introduced Threshold-based Naïve Bayes (Tb-NB). We run a horse race involving the analysis of the 20News-Group dataset, well known in the literature for its complexity. Our results highlight the importance of choosing the right classifier whilst pairing it with an optimal strategy, providing valuable insights for optimizing classifier performance in MCC classification tasks considering both environmental implications and the need for accurate predictions.
Multi-class text classification of news data
Maurizio Romano
Primo
;Maria Paola PriolaUltimo
2024-01-01
Abstract
Several Multi-class text classification (MCC) strategies, namely One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC), are compared in terms of accuracy and computational efficiency. Each strategy is implemented utilizing several classifiers such as Naïve Bayes, Random Forest, Logistic Regression, Neural Networks, Linear Discriminant Analysis, Support Vector Machine, and the recently-introduced Threshold-based Naïve Bayes (Tb-NB). We run a horse race involving the analysis of the 20News-Group dataset, well known in the literature for its complexity. Our results highlight the importance of choosing the right classifier whilst pairing it with an optimal strategy, providing valuable insights for optimizing classifier performance in MCC classification tasks considering both environmental implications and the need for accurate predictions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.