Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without ph...

Full description

Bibliographic Details
Main Authors:	Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, R Harald Baayen
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2017-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC5386243?pdf=render

_version_	1818305612151259136
author	Denis Arnold Fabian Tomaschek Konstantin Sering Florence Lopez R Harald Baayen
author_facet	Denis Arnold Fabian Tomaschek Konstantin Sering Florence Lopez R Harald Baayen
author_sort	Denis Arnold
collection	DOAJ
description	Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
first_indexed	2024-12-13T06:29:21Z
format	Article
id	doaj.art-4cea13a2e4e6477f82225359a1679ee7
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-13T06:29:21Z
publishDate	2017-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-4cea13a2e4e6477f82225359a1679ee72022-12-21T23:56:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01124e017462310.1371/journal.pone.0174623Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.Denis ArnoldFabian TomaschekKonstantin SeringFlorence LopezR Harald BaayenSound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.http://europepmc.org/articles/PMC5386243?pdf=render
spellingShingle	Denis Arnold Fabian Tomaschek Konstantin Sering Florence Lopez R Harald Baayen Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLoS ONE
title	Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
title_full	Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
title_fullStr	Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
title_full_unstemmed	Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
title_short	Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.
title_sort	words from spontaneous conversational speech can be recognized with human like accuracy by an error driven learning algorithm that discriminates between meanings straight from smart acoustic features bypassing the phoneme as recognition unit
url	http://europepmc.org/articles/PMC5386243?pdf=render
work_keys_str_mv	AT denisarnold wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit AT fabiantomaschek wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit AT konstantinsering wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit AT florencelopez wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit AT rharaldbaayen wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit

Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

Similar Items