Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise

Text-to-speech (TTS) voices, which vary in their apparent native language and dialect, are increasingly widespread. In this paper, we test how speakers perceive and align toward TTS voices that represent American, British, and Indian dialects of English and the extent that social attitudes shape pat...

Full description

Bibliographic Details
Main Authors: Nicole Dodd, Michelle Cohn, Georgia Zellou
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-07-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2023.1204211/full
_version_ 1797789225280602112
author Nicole Dodd
Michelle Cohn
Georgia Zellou
author_facet Nicole Dodd
Michelle Cohn
Georgia Zellou
author_sort Nicole Dodd
collection DOAJ
description Text-to-speech (TTS) voices, which vary in their apparent native language and dialect, are increasingly widespread. In this paper, we test how speakers perceive and align toward TTS voices that represent American, British, and Indian dialects of English and the extent that social attitudes shape patterns of convergence and divergence. We also test whether top-down knowledge of the talker, manipulated as a “human” or “device” guise, mediates these attitudes and accommodation. Forty-six American English-speaking participants completed identical interactions with 6 talkers (2 from each dialect) and rated each talker on a variety of social factors. Accommodation was assessed with AXB perceptual similarity by a separate group of raters. Results show that speakers had the strongest positive social attitudes toward the Indian English voices and converged toward them more. Conversely, speakers rate the American English voices as less human-like and diverge from them. Finally, speakers overall show more accommodation toward TTS voices that were presented in a “human” guise. We discuss these results through the lens of the Communication Accommodation Theory (CAT).
first_indexed 2024-03-13T01:47:41Z
format Article
id doaj.art-88958b2cf4e14547b6473b3b39c9d68e
institution Directory Open Access Journal
issn 2624-9898
language English
last_indexed 2024-03-13T01:47:41Z
publishDate 2023-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Computer Science
spelling doaj.art-88958b2cf4e14547b6473b3b39c9d68e2023-07-03T05:51:01ZengFrontiers Media S.A.Frontiers in Computer Science2624-98982023-07-01510.3389/fcomp.2023.12042111204211Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guiseNicole DoddMichelle CohnGeorgia ZellouText-to-speech (TTS) voices, which vary in their apparent native language and dialect, are increasingly widespread. In this paper, we test how speakers perceive and align toward TTS voices that represent American, British, and Indian dialects of English and the extent that social attitudes shape patterns of convergence and divergence. We also test whether top-down knowledge of the talker, manipulated as a “human” or “device” guise, mediates these attitudes and accommodation. Forty-six American English-speaking participants completed identical interactions with 6 talkers (2 from each dialect) and rated each talker on a variety of social factors. Accommodation was assessed with AXB perceptual similarity by a separate group of raters. Results show that speakers had the strongest positive social attitudes toward the Indian English voices and converged toward them more. Conversely, speakers rate the American English voices as less human-like and diverge from them. Finally, speakers overall show more accommodation toward TTS voices that were presented in a “human” guise. We discuss these results through the lens of the Communication Accommodation Theory (CAT).https://www.frontiersin.org/articles/10.3389/fcomp.2023.1204211/fullvoice-activated artificially intelligent (voice-AI) assistanthuman-computer interactionphonetic accommodationdialect imitationapparent guise
spellingShingle Nicole Dodd
Michelle Cohn
Georgia Zellou
Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
Frontiers in Computer Science
voice-activated artificially intelligent (voice-AI) assistant
human-computer interaction
phonetic accommodation
dialect imitation
apparent guise
title Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
title_full Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
title_fullStr Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
title_full_unstemmed Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
title_short Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
title_sort comparing alignment toward american british and indian english text to speech tts voices influence of social attitudes and talker guise
topic voice-activated artificially intelligent (voice-AI) assistant
human-computer interaction
phonetic accommodation
dialect imitation
apparent guise
url https://www.frontiersin.org/articles/10.3389/fcomp.2023.1204211/full
work_keys_str_mv AT nicoledodd comparingalignmenttowardamericanbritishandindianenglishtexttospeechttsvoicesinfluenceofsocialattitudesandtalkerguise
AT michellecohn comparingalignmenttowardamericanbritishandindianenglishtexttospeechttsvoicesinfluenceofsocialattitudesandtalkerguise
AT georgiazellou comparingalignmenttowardamericanbritishandindianenglishtexttospeechttsvoicesinfluenceofsocialattitudesandtalkerguise