Child voices in Text-To-Speech (TTS) are essential for enabling chil- dren with speech or communication difficulties to express themselves authenti- cally, supporting their social inclusion and sense of identity. However, the devel- opment and availability of high-quality child voices remain limited compared to adult voices, mostly because of the lack of child data and ethical reasons con- sidering that the voice is owned by a minor. In this paper we review the current state of the TTS synthesis technological landscape with special focus on systems that are offering child voices, as well as technology that is targeting offline use on low resource mobile devices, such as smartphones and tablets. We also explore research efforts in creating child TTS by reviewing papers on neural child TTS models in terms of the technologies that are used as well as the quality of voices. Additionally, we make a short summary of the available TTS engines and their voices to check the availability of child voices. Finally, we examine the use of child voices in available tools and applications.
A Review of the State of the Child Speech Synthesis Landscape
Pagliara, Silvio;Mura, Antonello;Gerazov, Branislav
2025-01-01
Abstract
Child voices in Text-To-Speech (TTS) are essential for enabling chil- dren with speech or communication difficulties to express themselves authenti- cally, supporting their social inclusion and sense of identity. However, the devel- opment and availability of high-quality child voices remain limited compared to adult voices, mostly because of the lack of child data and ethical reasons con- sidering that the voice is owned by a minor. In this paper we review the current state of the TTS synthesis technological landscape with special focus on systems that are offering child voices, as well as technology that is targeting offline use on low resource mobile devices, such as smartphones and tablets. We also explore research efforts in creating child TTS by reviewing papers on neural child TTS models in terms of the technologies that are used as well as the quality of voices. Additionally, we make a short summary of the available TTS engines and their voices to check the availability of child voices. Finally, we examine the use of child voices in available tools and applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


