The increasing spread of propaganda in digital media has intensified research efforts toward the development of automated detection systems. Central to this task is the availability and quality of annotated datasets, which directly impact model performance, generalizability, and real-world applicability. In this paper, we present a data-centric insight into the current landscape of datasets used for automated propaganda detection. We analyze a representative set of publicly available corpora with respect to key factors such as annotation schemes, label granularity, domain coverage, linguistic diversity, and class balance. This work aims to guide researchers toward more robust, inclusive, and scalable approaches to propaganda detection by emphasizing the foundational role of data quality and structure.

Exploring the Dataset Landscape for Automated Propaganda Detection: A Data-Centric Insight

Usai M.;Mura D. A.;Loddo A.;Sanguinetti M.;Zedda L.;Di Ruberto C.;Atzori M.
2025-01-01

Abstract

The increasing spread of propaganda in digital media has intensified research efforts toward the development of automated detection systems. Central to this task is the availability and quality of annotated datasets, which directly impact model performance, generalizability, and real-world applicability. In this paper, we present a data-centric insight into the current landscape of datasets used for automated propaganda detection. We analyze a representative set of publicly available corpora with respect to key factors such as annotation schemes, label granularity, domain coverage, linguistic diversity, and class balance. This work aims to guide researchers toward more robust, inclusive, and scalable approaches to propaganda detection by emphasizing the foundational role of data quality and structure.
2025
Dataset Benchmarking; Propaganda Detection; Span Identification
File in questo prodotto:
File Dimensione Formato  
2025_Exploring the Dataset Landscape for Automated Propaganda Detection_A Data-Centric Insight.pdf

accesso aperto

Descrizione: Articolo completo
Tipologia: versione editoriale (VoR)
Dimensione 204.86 kB
Formato Adobe PDF
204.86 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/482205
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact