Assessing the Quality of Experience (QoE) of multimedia services is crucial to ensure end users are satisfied. However, traditional feedback collection may suffer from bias due to the rating scale and may neglect the dynamic subjective nature of human perception, influenced by the user’s emotional state. This paper proposes an alternative solution to unobtrusively estimate QoE relying on features related to the user’s facial expressions and speech characteristics, which naturally reflect the user’s emotional state and enable QoE estimation without explicit feedback. The presented solution includes several steps, from the data collection process to the extraction and selection of the facial and speech features used to train the resulting QoE estimation models. Both single-modal and multi-modal learning approaches based on data fusion have been considered. These models can support the management of network and application resources of traditional and immersive multimedia services.

Facial and Speech-based Signal Processing Systems for Quality of Experience and Emotion Estimation of Multimedia Applications

Porcu, Simone;Floris, Alessandro
2025-01-01

Abstract

Assessing the Quality of Experience (QoE) of multimedia services is crucial to ensure end users are satisfied. However, traditional feedback collection may suffer from bias due to the rating scale and may neglect the dynamic subjective nature of human perception, influenced by the user’s emotional state. This paper proposes an alternative solution to unobtrusively estimate QoE relying on features related to the user’s facial expressions and speech characteristics, which naturally reflect the user’s emotional state and enable QoE estimation without explicit feedback. The presented solution includes several steps, from the data collection process to the extraction and selection of the facial and speech features used to train the resulting QoE estimation models. Both single-modal and multi-modal learning approaches based on data fusion have been considered. These models can support the management of network and application resources of traditional and immersive multimedia services.
2025
Quality of Experience
Facial Emotion Recognition
Speech Emotion Recognition
Multimedia
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/478366
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact