Crowd counting and density estimation are crucial functionalities in intelligent video surveillance systems but are also very challenging computer vision tasks in scenarios characterised by dense crowds, due to scale and perspective variations, overlapping and occlusions. Regression-based crowd counting models are used for dense crowd scenes, where pedestrian detection is infeasible. We focus on real-world, cross-scene application scenarios where no manually annotated images of the target scene are available for training regression models, but only images with different backgrounds and camera views can be used (e.g., from publicly available data sets), which can lead to low accuracy. To overcome this issue, we propose to build the training set using emph{synthetic} images of the target scene, which can be automatically annotated with no manual effort. This work provides a preliminary empirical evaluation of the effectiveness of the above solution. To this aim, we carry out experiments using real data sets as the target scenes (testing set) and using different kinds of synthetically generated crowd images of the target scenes as training data. Our results show that synthetic training images can be effective, provided that also their background, beside their perspective, closely reproduces the one of the target scene.

Investigating Synthetic Data Sets for Crowd Counting in Cross-scene Scenarios

Delussu, Rita
;
Putzu, Lorenzo;Fumera, Giorgio
2020-01-01

Abstract

Crowd counting and density estimation are crucial functionalities in intelligent video surveillance systems but are also very challenging computer vision tasks in scenarios characterised by dense crowds, due to scale and perspective variations, overlapping and occlusions. Regression-based crowd counting models are used for dense crowd scenes, where pedestrian detection is infeasible. We focus on real-world, cross-scene application scenarios where no manually annotated images of the target scene are available for training regression models, but only images with different backgrounds and camera views can be used (e.g., from publicly available data sets), which can lead to low accuracy. To overcome this issue, we propose to build the training set using emph{synthetic} images of the target scene, which can be automatically annotated with no manual effort. This work provides a preliminary empirical evaluation of the effectiveness of the above solution. To this aim, we carry out experiments using real data sets as the target scenes (testing set) and using different kinds of synthetically generated crowd images of the target scenes as training data. Our results show that synthetic training images can be effective, provided that also their background, beside their perspective, closely reproduces the one of the target scene.
978-989-758-402-2
Cross-scene, Crowd Analysis, Crowd Density Estimation, Synthetic Data Sets, Texture Features, Regression
File in questo prodotto:
File Dimensione Formato  
Paper.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: versione post-print
Dimensione 4.4 MB
Formato Adobe PDF
4.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/298186
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact