The integration of voice control into connected devices is expected to improve the efficiency and comfort of our daily lives. However, the underlying biometric systems often impose constraints on the individual or the environment during interaction (e.g., quiet surroundings). Such constraints have to be surmounted in order to seamlessly recognize individuals. In this paper, we propose an evaluation framework for speaker recognition in noisy smart living environments. To this end, we designed a taxonomy of sounds (e.g., home-related, mechanical) that characterize representative indoor and outdoor environments where speaker recognition is adopted. Then, we devised an approach for off-line simulation of challenging noisy conditions in vocal audios originally collected under controlled environments, by leveraging our taxonomy. Our approach adds a (combination of) sound(s) belonging to the target environment into the current vocal example. Experiments on a large-scale public dataset and two state-of-the-art speaker recognition models show that adding certain background sounds to clean vocal audio leads to a substantial deterioration of recognition performance. In several noisy settings, our findings reveal that a speaker recognition model might end up to make unreliable decisions. Our framework is intended to help system designers evaluate performance deterioration and develop speaker recognition models more robust to smart living environments.

Evaluation framework for context-aware speaker recognition in noisy smart living environments

Fenu Gianni;Galici R.;Marras Mirko
2020

Abstract

The integration of voice control into connected devices is expected to improve the efficiency and comfort of our daily lives. However, the underlying biometric systems often impose constraints on the individual or the environment during interaction (e.g., quiet surroundings). Such constraints have to be surmounted in order to seamlessly recognize individuals. In this paper, we propose an evaluation framework for speaker recognition in noisy smart living environments. To this end, we designed a taxonomy of sounds (e.g., home-related, mechanical) that characterize representative indoor and outdoor environments where speaker recognition is adopted. Then, we devised an approach for off-line simulation of challenging noisy conditions in vocal audios originally collected under controlled environments, by leveraging our taxonomy. Our approach adds a (combination of) sound(s) belonging to the target environment into the current vocal example. Experiments on a large-scale public dataset and two state-of-the-art speaker recognition models show that adding certain background sounds to clean vocal audio leads to a substantial deterioration of recognition performance. In several noisy settings, our findings reveal that a speaker recognition model might end up to make unreliable decisions. Our framework is intended to help system designers evaluate performance deterioration and develop speaker recognition models more robust to smart living environments.
Deep Learning; Noisy Environments; Security; Sound Taxonomy; Speaker Recognition; Speaker Verification
File in questo prodotto:
File Dimensione Formato  
paper27.pdf

accesso aperto

Tipologia: versione editoriale
Dimensione 898.98 kB
Formato Adobe PDF
898.98 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11584/321859
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact