UNICA IRIS Institutional Research Information System

Child voices in Text-To-Speech (TTS) are essential for enabling children with speech or communication difficulties to express themselves authentically, supporting their social inclusion and sense of identity. However, the development and availability of high-quality child voices remain limited compared to adult voices, mostly because of the lack of child data and ethical reasons considering that the voice is owned by a minor. In this paper we review the current state of the TTS synthesis technological landscape with special focus on systems that are offering child voices, as well as technology that is targeting offline use on low resource mobile devices, such as smartphones and tablets. We also explore research efforts in creating child TTS by reviewing papers on neural child TTS models in terms of the technologies that are used as well as the quality of voices. Additionally, we make a short summary of the available TTS engines and their voices to check the availability of child voices. Finally, we examine the use of child voices in available tools and applications.

A review of the state of the child speech synthesis landscape

Lazareva, Vanesa;Dimitrovska, Marija Markovska;Pagliara, Silvio;Mavrou, Katerina;Theodorou, Eleni;Taskovski, Dimitar;Todorovska, Danche;Zanfardino, Francesco;Spera, Antonio;Mura, Antonello;Rybińska, Anna;Agius, May;Charalambous-Darden, Nefi;Łuszczak, Katarzyna;Gerazov, Branislav

2025-01-01

Abstract

Child voices in Text-To-Speech (TTS) are essential for enabling children with speech or communication difficulties to express themselves authentically, supporting their social inclusion and sense of identity. However, the development and availability of high-quality child voices remain limited compared to adult voices, mostly because of the lack of child data and ethical reasons considering that the voice is owned by a minor. In this paper we review the current state of the TTS synthesis technological landscape with special focus on systems that are offering child voices, as well as technology that is targeting offline use on low resource mobile devices, such as smartphones and tablets. We also explore research efforts in creating child TTS by reviewing papers on neural child TTS models in terms of the technologies that are used as well as the quality of voices. Additionally, we make a short summary of the available TTS engines and their voices to check the availability of child voices. Finally, we examine the use of child voices in available tools and applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783032016317
9783032016324
			
	Parole chiave
	
				Text To Speech (TTS); Child TTS voices; Multi-speaker TTS
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
A Review of the State of the Child Speech Synthesis Landscape.pdf Solo gestori archivio Tipologia: versione editoriale (VoR) Dimensione 151.48 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	151.48 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/455405

Citazioni

ND

ND

ND

social impact