Utilizing monocular cameras for 3D object understanding is widely recognized as a cost-effective approach, spanning applications such as autonomous driving, augmented/virtual reality or roadside monitoring. Despite recent progress, persistent challenges arise in creating generalized models adaptable to unforeseen scenarios and diverse camera configurations. In this work, we focus on the task of monocular 3D object detection within roadside environments. To begin, we introduce a versatile methodology for generating and labeling datasets tailored to roadside scenarios, addressing limitations encountered in real-world settings. Subsequently, we develop an array of deep learning models tailored to this task, refining them to address practical challenges that emerge during real-world application. Lastly, leveraging our framework, we curated a synthetic benchmark dataset comprising 1,415,680 frames and 8,902,636 labeled 3D objects, ultimately assessing the performance of existing models across all datasets.

RoadSense3D: A Framework for Roadside Monocular 3D Object Detection

Carta S.;Marras M.;Mohamed S.;Podda A. S.;Saia R.;Sau M.;
2024-01-01

Abstract

Utilizing monocular cameras for 3D object understanding is widely recognized as a cost-effective approach, spanning applications such as autonomous driving, augmented/virtual reality or roadside monitoring. Despite recent progress, persistent challenges arise in creating generalized models adaptable to unforeseen scenarios and diverse camera configurations. In this work, we focus on the task of monocular 3D object detection within roadside environments. To begin, we introduce a versatile methodology for generating and labeling datasets tailored to roadside scenarios, addressing limitations encountered in real-world settings. Subsequently, we develop an array of deep learning models tailored to this task, refining them to address practical challenges that emerge during real-world application. Lastly, leveraging our framework, we curated a synthetic benchmark dataset comprising 1,415,680 frames and 8,902,636 labeled 3D objects, ultimately assessing the performance of existing models across all datasets.
2024
3D Object Detection
Monocular 3D Perception
Roadside Dataset
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/432645
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact