Road surveillance systems play an important role in traffic monitoring and detecting hazardous events. In recent years, several artificial intelligence-based approaches have been proposed for this purpose, typically based on the analysis of the acquired video streams. However, occlusions, poor lighting conditions, and heterogeneity of the events may often reduce their effectiveness and reliability. To overcome the limitations mentioned, scientific and industrial research has therefore focused on integrating such solutions with audio recognition methods. By automatically identifying anomalous traffic sounds, e.g., car crashes and skids, they help reduce false positives and missed alarms. Following this trend, in this work, we propose an innovative pipeline for the analysis of intensity-projected audio spectrograms from streams of traffic sounds, which exploits both (i) a visual approach based on a custom, special-purpose Convolutional Neural Network for the identification of anomalous events on the sound signal; and, (ii) a novel multi-representational encoding of the input, which proved to significantly improve the recognition accuracy of the neural models. The validation results of the proposed pipeline on the public MIVIA dataset, with a 0.96% of false positive rate, showed to be the best performance against the stateof-the-art competitors. Notably, following such findings, a prototype implementation has been deployed on a real-world video surveillance infrastructure.

CARgram: CNN-based accident recognition from road sounds through intensity-projected spectrogram analysis

Podda, Alessandro Sebastian;Balia, Riccardo;Pompianu, Livio;Carta, Salvatore;Fenu, Gianni;Saia, Roberto
2024-01-01

Abstract

Road surveillance systems play an important role in traffic monitoring and detecting hazardous events. In recent years, several artificial intelligence-based approaches have been proposed for this purpose, typically based on the analysis of the acquired video streams. However, occlusions, poor lighting conditions, and heterogeneity of the events may often reduce their effectiveness and reliability. To overcome the limitations mentioned, scientific and industrial research has therefore focused on integrating such solutions with audio recognition methods. By automatically identifying anomalous traffic sounds, e.g., car crashes and skids, they help reduce false positives and missed alarms. Following this trend, in this work, we propose an innovative pipeline for the analysis of intensity-projected audio spectrograms from streams of traffic sounds, which exploits both (i) a visual approach based on a custom, special-purpose Convolutional Neural Network for the identification of anomalous events on the sound signal; and, (ii) a novel multi-representational encoding of the input, which proved to significantly improve the recognition accuracy of the neural models. The validation results of the proposed pipeline on the public MIVIA dataset, with a 0.96% of false positive rate, showed to be the best performance against the stateof-the-art competitors. Notably, following such findings, a prototype implementation has been deployed on a real-world video surveillance infrastructure.
2024
Convolutional neural networks; Deep learning; Audio analysis; Traffic surveillance; Artificial Intelligence
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1051200424000563-main.pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 1.96 MB
Formato Adobe PDF
1.96 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/425794
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact