Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial...

Full description

Bibliographic Details
Main Author:	Gašper Beguš
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2020-07-01
Series:	Frontiers in Artificial Intelligence
Subjects:	generative adversarial networks deep neural network interpretability language acquisition speech voice onset time allophonic distribution
Online Access:	https://www.frontiersin.org/article/10.3389/frai.2020.00044/full

_version_	1828376419947773952
author	Gašper Beguš Gašper Beguš
author_facet	Gašper Beguš Gašper Beguš
author_sort	Gašper Beguš
collection	DOAJ
description	Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.
first_indexed	2024-04-14T08:00:41Z
format	Article
id	doaj.art-c31cf2d678e04024a957f71ce4faf452
institution	Directory Open Access Journal
issn	2624-8212
language	English
last_indexed	2024-04-14T08:00:41Z
publishDate	2020-07-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Artificial Intelligence
spelling	doaj.art-c31cf2d678e04024a957f71ce4faf4522022-12-22T02:04:55ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122020-07-01310.3389/frai.2020.00044530080Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural NetworksGašper Beguš0Gašper Beguš1Department of Linguistics, University of California, Berkeley, Berkeley, CA, United StatesDepartment of Linguistics, University of Washington, Seattle, WA, United StatesTraining deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.https://www.frontiersin.org/article/10.3389/frai.2020.00044/fullgenerative adversarial networksdeep neural network interpretabilitylanguage acquisitionspeechvoice onset timeallophonic distribution
spellingShingle	Gašper Beguš Gašper Beguš Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks Frontiers in Artificial Intelligence generative adversarial networks deep neural network interpretability language acquisition speech voice onset time allophonic distribution
title	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_full	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_fullStr	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_full_unstemmed	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_short	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_sort	generative adversarial phonology modeling unsupervised phonetic and phonological learning with neural networks
topic	generative adversarial networks deep neural network interpretability language acquisition speech voice onset time allophonic distribution
url	https://www.frontiersin.org/article/10.3389/frai.2020.00044/full
work_keys_str_mv	AT gasperbegus generativeadversarialphonologymodelingunsupervisedphoneticandphonologicallearningwithneuralnetworks AT gasperbegus generativeadversarialphonologymodelingunsupervisedphoneticandphonologicallearningwithneuralnetworks

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

Similar Items