Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN

Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; ther...

Full description

Bibliographic Details
Main Authors:	Jungbeom Ko, Hyunchul Kim, Jungsuk Kim
Format:	Article
Language:	English
Published:	MDPI AG 2022-06-01
Series:	Sensors
Subjects:	sound source localization deep learning multi-stream CNN IoT device
Online Access:	https://www.mdpi.com/1424-8220/22/12/4650

_version_	1797482340412293120
author	Jungbeom Ko Hyunchul Kim Jungsuk Kim
author_facet	Jungbeom Ko Hyunchul Kim Jungsuk Kim
author_sort	Jungbeom Ko
collection	DOAJ
description	Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; therefore, users cannot avoid touching them for face-to-face interactions, which contradicts the voice-activated AI philosophy. In this paper, we propose a deep neural network-based real-time sound source localization (SSL) model for low-power internet of things (IoT) devices based on microphone arrays and present a prototype implemented on actual IoT devices. The proposed SSL model delivers multi-channel acoustic data to parallel convolutional neural network layers in the form of multiple streams to capture the unique delay patterns for the low-, mid-, and high-frequency ranges, and estimates the fine and coarse location of voices. The model adapted in this study achieved an accuracy of 91.41% on fine location estimation and a direction of arrival error of 7.43° on noisy data. It achieved a processing time of 7.811 ms per 40 ms samples on the Raspberry Pi 4B. The proposed model can be applied to a camera-based humanoid robot that mimics the manner in which humans react to trigger voices in crowded environments.
first_indexed	2024-03-09T22:30:54Z
format	Article
id	doaj.art-6b59f52f3b6e4a47b9ba97b70a70e7cc
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T22:30:54Z
publishDate	2022-06-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-6b59f52f3b6e4a47b9ba97b70a70e7cc2023-11-23T18:56:45ZengMDPI AGSensors1424-82202022-06-012212465010.3390/s22124650Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNNJungbeom Ko0Hyunchul Kim1Jungsuk Kim2Department of Health Sciences & Technology, Gachon Advanced Institute for Health Sciences & Technology (GAIHST), Gachon University, Incheon 21936, KoreaSchool of Information, University of California, 102 South Hall 4600, Berkeley, CA 94720, USADepartment of Biomedical Engineering, Gachon University, 191 Hambakmoe-ro, Incheon 21936, KoreaVoice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; therefore, users cannot avoid touching them for face-to-face interactions, which contradicts the voice-activated AI philosophy. In this paper, we propose a deep neural network-based real-time sound source localization (SSL) model for low-power internet of things (IoT) devices based on microphone arrays and present a prototype implemented on actual IoT devices. The proposed SSL model delivers multi-channel acoustic data to parallel convolutional neural network layers in the form of multiple streams to capture the unique delay patterns for the low-, mid-, and high-frequency ranges, and estimates the fine and coarse location of voices. The model adapted in this study achieved an accuracy of 91.41% on fine location estimation and a direction of arrival error of 7.43° on noisy data. It achieved a processing time of 7.811 ms per 40 ms samples on the Raspberry Pi 4B. The proposed model can be applied to a camera-based humanoid robot that mimics the manner in which humans react to trigger voices in crowded environments.https://www.mdpi.com/1424-8220/22/12/4650sound source localizationdeep learningmulti-stream CNNIoT device
spellingShingle	Jungbeom Ko Hyunchul Kim Jungsuk Kim Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN Sensors sound source localization deep learning multi-stream CNN IoT device
title	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_full	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_fullStr	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_full_unstemmed	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_short	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_sort	real time sound source localization for low power iot devices based on multi stream cnn
topic	sound source localization deep learning multi-stream CNN IoT device
url	https://www.mdpi.com/1424-8220/22/12/4650
work_keys_str_mv	AT jungbeomko realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn AT hyunchulkim realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn AT jungsukkim realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn

Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN

Similar Items