Intelligent video-surveillance is at present one of the most active research fields in computer science. It brings together a wide variety of computer vision and machine learning techniques to provide useful tools for surveillance operators and forensic video analytics. Person re-identification is among these tools; it consists of recognising whether an individual has already been observed over a network of cameras. Person re-identification has various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking overmultiple cameras. The task is typically achieved by exploiting the clothing appearance, as classical biometric traits like the face are impractical in real-world video surveillance scenarios. Clothing appearance is represented by means of low-level local and global features of the images, usually extracted according to some part-based body model to treat different body parts (e.g. torso and legs) independently. The use of novel sensor technologies, e.g. RGB-D cameras like the MS Kinect, could also allow for the extraction of anthropometric measures from a reconstructed 3D model of the body, that can be used in combination with the clothing appearance to increase recognition accuracy. This thesis presents a novel framework, namedMultipleComponentDissimilarity (MCD), to construct descriptors of images of persons, using dissimilarity representations, a recent paradigm in machine learning in which the objects of interest are described as vectors of dissimilarities to a set of predefined prototypes. MCD extends the original dissimilarity paradigm to objects decomposable in multiple parts and with localised characteristics, to better deal with the peculiarities of the human body. The use of MCD has at least three important advantages: (i) a drastic reduction of computational needs, mostly due to the compactness of dissimilarity representations (basically, small vectors of real numbers, easy to store and very fast to be matched); (ii) a totally generic formulation of the underlying low-level representation, that allows one to combine different descriptors, even if they are heterogeneous in terms of the model and features used, into a single dissimilarity vector; (iii) it provides a natural way to learn high-level concepts from low-level representations. Building on its above salient features, MCD is used in this thesis to achieve several objectives: (i) develop an approach to speed up existing person re-identification methods; iii (ii) implement a novel person re-identification method based on the combination of different local and global features into a single dissimilarity vector, able to attain state-ofthe- art performance; (iv) develop a multi-modal approach to person re-identification (a novelty in the literature), by combining the clothing appearance with anthropometric measures extracted through the use of novel RGB-D sensors, into a single dissimilarity vector; (v) develop a method to perform a novel task, proposed for the first time in this thesis, consisting in finding, among a set of images of individuals, those relevant to a textual, semantic query describing clothing appearance of an individual of interest. This task has been named appearance-based people search and can be useful in applications like forensics video analysis, where a textual description of the individual of interest given by a witness can be available, instead of an image. Person re-identification and appearance-based people search are different tasks, aimed at addressing different problems. Still, they can be seen as instances of the more general problem of searching and matching people on multi-media data, e.g., video footages, rangedepth data, speech audio data. Building on the commonalities with Information Retrieval, in the final part of the thesis, a possible formulation of the task of people search on multimedia data will be proposed, with some suggestions and guidelines on how to exploit the MCD framework for addressing this novel class of problems.

Dissimilarity-based people re-identification and search for intelligent video surveillance

-
2013-04-23

Abstract

Intelligent video-surveillance is at present one of the most active research fields in computer science. It brings together a wide variety of computer vision and machine learning techniques to provide useful tools for surveillance operators and forensic video analytics. Person re-identification is among these tools; it consists of recognising whether an individual has already been observed over a network of cameras. Person re-identification has various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking overmultiple cameras. The task is typically achieved by exploiting the clothing appearance, as classical biometric traits like the face are impractical in real-world video surveillance scenarios. Clothing appearance is represented by means of low-level local and global features of the images, usually extracted according to some part-based body model to treat different body parts (e.g. torso and legs) independently. The use of novel sensor technologies, e.g. RGB-D cameras like the MS Kinect, could also allow for the extraction of anthropometric measures from a reconstructed 3D model of the body, that can be used in combination with the clothing appearance to increase recognition accuracy. This thesis presents a novel framework, namedMultipleComponentDissimilarity (MCD), to construct descriptors of images of persons, using dissimilarity representations, a recent paradigm in machine learning in which the objects of interest are described as vectors of dissimilarities to a set of predefined prototypes. MCD extends the original dissimilarity paradigm to objects decomposable in multiple parts and with localised characteristics, to better deal with the peculiarities of the human body. The use of MCD has at least three important advantages: (i) a drastic reduction of computational needs, mostly due to the compactness of dissimilarity representations (basically, small vectors of real numbers, easy to store and very fast to be matched); (ii) a totally generic formulation of the underlying low-level representation, that allows one to combine different descriptors, even if they are heterogeneous in terms of the model and features used, into a single dissimilarity vector; (iii) it provides a natural way to learn high-level concepts from low-level representations. Building on its above salient features, MCD is used in this thesis to achieve several objectives: (i) develop an approach to speed up existing person re-identification methods; iii (ii) implement a novel person re-identification method based on the combination of different local and global features into a single dissimilarity vector, able to attain state-ofthe- art performance; (iv) develop a multi-modal approach to person re-identification (a novelty in the literature), by combining the clothing appearance with anthropometric measures extracted through the use of novel RGB-D sensors, into a single dissimilarity vector; (v) develop a method to perform a novel task, proposed for the first time in this thesis, consisting in finding, among a set of images of individuals, those relevant to a textual, semantic query describing clothing appearance of an individual of interest. This task has been named appearance-based people search and can be useful in applications like forensics video analysis, where a textual description of the individual of interest given by a witness can be available, instead of an image. Person re-identification and appearance-based people search are different tasks, aimed at addressing different problems. Still, they can be seen as instances of the more general problem of searching and matching people on multi-media data, e.g., video footages, rangedepth data, speech audio data. Building on the commonalities with Information Retrieval, in the final part of the thesis, a possible formulation of the task of people search on multimedia data will be proposed, with some suggestions and guidelines on how to exploit the MCD framework for addressing this novel class of problems.
23-apr-2013
Person re-identification
RGB-D
appearance
biometria
computer vision
dissimilarity representations
intelligent video surveillance
machine learning
multimodal
pattern recognition
people search
riconoscimento automatico
videosorveglianza intelligente
Satta, Riccardo
File in questo prodotto:
File Dimensione Formato  
Satta_PhD_Thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 7.25 MB
Formato Adobe PDF
7.25 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/266248
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact