A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence

(1) <i>Background:</i> Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related t...

Full description

Bibliographic Details
Main Authors: Camryn Terblanche, Michal Harty, Michelle Pascoe, Benjamin V. Tucker
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/11/5623
_version_ 1797494135400169472
author Camryn Terblanche
Michal Harty
Michelle Pascoe
Benjamin V. Tucker
author_facet Camryn Terblanche
Michal Harty
Michelle Pascoe
Benjamin V. Tucker
author_sort Camryn Terblanche
collection DOAJ
description (1) <i>Background:</i> Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related to developing synthesised speech for children. (2) <i>Method:</i> The included studies were those that were (1) published between 2006 and 2021 and (2) included child participants or voices of children aged between 2–16 years old. (3) <i>Results:</i> 58 studies were identified. They were discussed based on the languages used, the speech-synthesis systems and/or methods used, the speech data used, the intelligibility of the speech and the ages of the voices. Based on the reviewed studies, relative to adult-speech synthesis, developing child-speech synthesis is notably more challenging. Child speech often presents with acoustic variability and articulatory errors. To account for this, researchers have most often attempted to adapt adult-speech models, using a variety of different adaptation techniques. (4) <i>Conclusions:</i> Adapting adult speech has proven successful in child-speech synthesis. It appears that the resulting quality can be improved by training a large amount of pre-selected speech data, aided by a neural-network classifier, to better match the children’s speech. We encourage future research surrounding individualised synthetic speech for children with CCN, with special attention to children who make use of low-resource languages.
first_indexed 2024-03-10T01:29:59Z
format Article
id doaj.art-af0fb8703a6848d7ac0b80abfad22424
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T01:29:59Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-af0fb8703a6848d7ac0b80abfad224242023-11-23T13:45:00ZengMDPI AGApplied Sciences2076-34172022-06-011211562310.3390/app12115623A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative EvidenceCamryn Terblanche0Michal Harty1Michelle Pascoe2Benjamin V. Tucker3Department of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South AfricaDepartment of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South AfricaDepartment of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South AfricaDepartment of Linguistics, University of Alberta, Edmonton, AB T6G 2R3, Canada(1) <i>Background:</i> Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related to developing synthesised speech for children. (2) <i>Method:</i> The included studies were those that were (1) published between 2006 and 2021 and (2) included child participants or voices of children aged between 2–16 years old. (3) <i>Results:</i> 58 studies were identified. They were discussed based on the languages used, the speech-synthesis systems and/or methods used, the speech data used, the intelligibility of the speech and the ages of the voices. Based on the reviewed studies, relative to adult-speech synthesis, developing child-speech synthesis is notably more challenging. Child speech often presents with acoustic variability and articulatory errors. To account for this, researchers have most often attempted to adapt adult-speech models, using a variety of different adaptation techniques. (4) <i>Conclusions:</i> Adapting adult speech has proven successful in child-speech synthesis. It appears that the resulting quality can be improved by training a large amount of pre-selected speech data, aided by a neural-network classifier, to better match the children’s speech. We encourage future research surrounding individualised synthetic speech for children with CCN, with special attention to children who make use of low-resource languages.https://www.mdpi.com/2076-3417/12/11/5623augmentative and alternative communication (AAC)childrencomplex communication needsneural networksspeech synthesis
spellingShingle Camryn Terblanche
Michal Harty
Michelle Pascoe
Benjamin V. Tucker
A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
Applied Sciences
augmentative and alternative communication (AAC)
children
complex communication needs
neural networks
speech synthesis
title A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
title_full A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
title_fullStr A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
title_full_unstemmed A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
title_short A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence
title_sort situational analysis of current speech synthesis systems for child voices a scoping review of qualitative and quantitative evidence
topic augmentative and alternative communication (AAC)
children
complex communication needs
neural networks
speech synthesis
url https://www.mdpi.com/2076-3417/12/11/5623
work_keys_str_mv AT camrynterblanche asituationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT michalharty asituationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT michellepascoe asituationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT benjaminvtucker asituationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT camrynterblanche situationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT michalharty situationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT michellepascoe situationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence
AT benjaminvtucker situationalanalysisofcurrentspeechsynthesissystemsforchildvoicesascopingreviewofqualitativeandquantitativeevidence