In this paper we investigate on the potentials to implicitly estimate the Quality of Experience (QoE) of a user of video streaming services by acquiring a video of her face and monitoring her facial expression and gaze direction. To this, we conducted a crowdsourcing test in which participants were asked to watch and rate the quality when watching 20 videos subject to different impairments, while their face was recorded with their PC's webcam. The following features were then considered: the Action Units (AU) that represent the facial expression, and the position of the eyes' pupil. These features were then used, together with the respective QoE values provided by the participants, to train three machine learning classifiers, namely, Support Vector Machine with quadratic kernel, RUSBoost trees and bagged trees. We considered two prediction models: only the AU features are considered or together with the position of the eyes' pupils. The RUSBoost trees achieved the best results in terms of accuracy, sensitivity and area under the curve scores. In particular, when all the features were considered, the achieved accuracy is of 44.7%, 59.4% and 75.3% when using the 5-level, 3level and 2-level quality scales, respectively. Whereas these results are not satisfactory yet, these represent a promising basis.

Towards the prediction of the quality of experience from facial expression and gaze direction

Porcu S.
;
Floris A.;Atzori L.
2019-01-01

Abstract

In this paper we investigate on the potentials to implicitly estimate the Quality of Experience (QoE) of a user of video streaming services by acquiring a video of her face and monitoring her facial expression and gaze direction. To this, we conducted a crowdsourcing test in which participants were asked to watch and rate the quality when watching 20 videos subject to different impairments, while their face was recorded with their PC's webcam. The following features were then considered: the Action Units (AU) that represent the facial expression, and the position of the eyes' pupil. These features were then used, together with the respective QoE values provided by the participants, to train three machine learning classifiers, namely, Support Vector Machine with quadratic kernel, RUSBoost trees and bagged trees. We considered two prediction models: only the AU features are considered or together with the position of the eyes' pupils. The RUSBoost trees achieved the best results in terms of accuracy, sensitivity and area under the curve scores. In particular, when all the features were considered, the achieved accuracy is of 44.7%, 59.4% and 75.3% when using the 5-level, 3level and 2-level quality scales, respectively. Whereas these results are not satisfactory yet, these represent a promising basis.
2019
978-1-5386-8336-1
Crowdsourcing; Facial expression; Gaze direction; Machine learning; Quality of Experience
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/292587
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 6
social impact