With the rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This paper introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The paper also presents a new dataset of 60,000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55% respectively.

Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset

Serreli, Luigi;Marche, Claudio;Nitti, Michele
2024-01-01

Abstract

With the rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This paper introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The paper also presents a new dataset of 60,000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55% respectively.
2024
Data volume; Deep learning; Natural language processing; Topic classification
File in questo prodotto:
File Dimensione Formato  
Reducing_Data_Volume_in_News_Topic_Classification_Deep_Learning_Framework_and_Dataset.pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 2.27 MB
Formato Adobe PDF
2.27 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/434285
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact