Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a f...

Full description

Bibliographic Details
Main Authors:	Kan Li, José C. Príncipe
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2018-04-01
Series:	Frontiers in Neuroscience
Subjects:	spike-based learning noise-robust automatic speech recognition (ASR) keyword spotting kernel adaptive filtering (KAF) reproducing kernel Hilbert space (RKHS) kernel method
Online Access:	http://journal.frontiersin.org/article/10.3389/fnins.2018.00194/full

_version_	1818253357228228608
author	Kan Li José C. Príncipe
author_facet	Kan Li José C. Príncipe
author_sort	Kan Li
collection	DOAJ
description	This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.
first_indexed	2024-12-12T16:38:47Z
format	Article
id	doaj.art-41242c754a434be4ac8cfadfa87c0b2a
institution	Directory Open Access Journal
issn	1662-453X
language	English
last_indexed	2024-12-12T16:38:47Z
publishDate	2018-04-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroscience
spelling	doaj.art-41242c754a434be4ac8cfadfa87c0b2a2022-12-22T00:18:36ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2018-04-011210.3389/fnins.2018.00194275461Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert SpaceKan LiJosé C. PríncipeThis paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.http://journal.frontiersin.org/article/10.3389/fnins.2018.00194/fullspike-based learningnoise-robust automatic speech recognition (ASR)keyword spottingkernel adaptive filtering (KAF)reproducing kernel Hilbert space (RKHS)kernel method
spellingShingle	Kan Li José C. Príncipe Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space Frontiers in Neuroscience spike-based learning noise-robust automatic speech recognition (ASR) keyword spotting kernel adaptive filtering (KAF) reproducing kernel Hilbert space (RKHS) kernel method
title	Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
title_full	Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
title_fullStr	Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
title_full_unstemmed	Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
title_short	Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
title_sort	biologically inspired spike based automatic speech recognition of isolated digits over a reproducing kernel hilbert space
topic	spike-based learning noise-robust automatic speech recognition (ASR) keyword spotting kernel adaptive filtering (KAF) reproducing kernel Hilbert space (RKHS) kernel method
url	http://journal.frontiersin.org/article/10.3389/fnins.2018.00194/full
work_keys_str_mv	AT kanli biologicallyinspiredspikebasedautomaticspeechrecognitionofisolateddigitsoverareproducingkernelhilbertspace AT josecprincipe biologicallyinspiredspikebasedautomaticspeechrecognitionofisolateddigitsoverareproducingkernelhilbertspace

Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

Similar Items