Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

Mohamed, S.; Zimmer, W.; Greer, R.; Ghita, A. A.; Castrillon-Santana, M.; Trivedi, M.; Knoll, A.; Carta, S. M.; Marras, M.

doi:10.1007/978-3-031-91813-1_20

Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available at https://roadsense3d.github.io.

Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

Mohamed S.;Zimmer W.;Greer R.;Ghita A. A.;Castrillon-Santana M.;Trivedi M.;Knoll A.;Carta S. M.;Marras M.

2025-01-01

Abstract

Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available at https://roadsense3d.github.io.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783031918124
9783031918131
			
	Parole chiave
	
				Intelligent Transportation Systems
Intelligent Vehicles
Monocular 3D Object Detection
Synthetic Data
Transfer Learning
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2408.15637v1 AAM.pdf embargo fino al 12/05/2026 Tipologia: versione post-print (AAM) Dimensione 4.64 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.64 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/459098

Citazioni

ND

0

0

UNICA IRIS Institutional Research Information System

Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

Mohamed S.;Zimmer W.;Greer R.;Ghita A. A.;Castrillon-Santana M.;Trivedi M.;Knoll A.;Carta S. M.;Marras M.

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

UNICA IRIS Institutional Research Information System

Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

Mohamed S.;Zimmer W.;Greer R.;Ghita A. A.;Castrillon-Santana M.;Trivedi M.;Knoll A.;Carta S. M.;Marras M.

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)