Head pose estimation is a sensitive topic in video surveillance/smart ambient scenarios since head rotations can hide/distort discriminative features of the face. Face recognition would often tackle the problem of video frames where subjects appear in poses making it quite impossible. In this respect, the selection of the frames with the best face orientation can allow triggering recognition only on these, therefore decreasing the possibility of errors. This paper proposes a novel approach to head pose estimation for smart cities and video surveillance scenarios, aiming at this goal. The method relies on a cascade of two models: the first one predicts the positions of 68 well-known face landmarks; the second one applies a web-shaped model over the detected landmarks, to associate each of them to a specific face sector. The method can work on detected faces at a reasonable distance and with a resolution that is supported by several present devices. Results of experiments executed over some classical pose estimation benchmarks, namely Point '04, Biwi, and AFLW datasets show good performance in terms of both pose estimation and computing time. Further results refer to noisy images that are typical of the addressed settings. Finally, examples demonstrate the selection of the best frames from videos captured in video surveillance conditions.

Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection

Barra S.;
2020-01-01

Abstract

Head pose estimation is a sensitive topic in video surveillance/smart ambient scenarios since head rotations can hide/distort discriminative features of the face. Face recognition would often tackle the problem of video frames where subjects appear in poses making it quite impossible. In this respect, the selection of the frames with the best face orientation can allow triggering recognition only on these, therefore decreasing the possibility of errors. This paper proposes a novel approach to head pose estimation for smart cities and video surveillance scenarios, aiming at this goal. The method relies on a cascade of two models: the first one predicts the positions of 68 well-known face landmarks; the second one applies a web-shaped model over the detected landmarks, to associate each of them to a specific face sector. The method can work on detected faces at a reasonable distance and with a resolution that is supported by several present devices. Results of experiments executed over some classical pose estimation benchmarks, namely Point '04, Biwi, and AFLW datasets show good performance in terms of both pose estimation and computing time. Further results refer to noisy images that are typical of the addressed settings. Finally, examples demonstrate the selection of the best frames from videos captured in video surveillance conditions.
2020
Head pose estimation; head pose exemplar selection; smart cities applications; web-shaped model
File in questo prodotto:
File Dimensione Formato  
09057523.pdf

Solo gestori archivio

Descrizione: articolo principale
Tipologia: versione editoriale (VoR)
Dimensione 5.05 MB
Formato Adobe PDF
5.05 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/288729
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 46
  • ???jsp.display-item.citation.isi??? 36
social impact