Voice keyword recognition based on spiking convolutional neural network for human-machine interface

In this paper, a spiking convolutional neural network (SCNN) model for voice keyword recognition is presented. The model consists of an input pre-processing layer, a spiking neural network (SNN) layer with build-in filter bank and the convolutional neural network (CNN) layers. A 16-channel infinite...

Full description

Bibliographic Details
Main Authors: Hu, Jinhai, Goh, Wang Ling, Zhang, Zhongyi, Gao, Yuan
Other Authors: School of Electrical and Electronic Engineering
Format: Conference Paper
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179088
https://ieeexplore.ieee.org/abstract/document/9081859
Description
Summary:In this paper, a spiking convolutional neural network (SCNN) model for voice keyword recognition is presented. The model consists of an input pre-processing layer, a spiking neural network (SNN) layer with build-in filter bank and the convolutional neural network (CNN) layers. A 16-channel infinite impulse response (IIR) filter bank with energy detector extracts power from the voice signal band and converts it to spikes via the SNN layer. The spiking rate in a defined time window is used as the inputs to the following CNN layers for classification. The network is trained using a voice digit dataset, while the weights of the convolutional layers are adjusted through the training of spike-integration results obtained from the spiking layer. This model has been implemented for voice keyword recognition and achieved 96.0 % accuracy. The combination of SNN and CNN reduces the overall number of layer and neuron in the system without compromise in classification accuracy. It is suitable for low-power hardware implementation in edge devices for human machine interface (HMI) applications.