Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning

Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integrati...

Full description

Bibliographic Details
Main Authors:	Amira A. Mohamed, Amira Eltokhy, Abdelhalim A. Zekry
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Applied Sciences
Subjects:	speaker separation speaker identification deep learning VoIP
Online Access:	https://www.mdpi.com/2076-3417/13/7/4261

_version_	1797608417929461760
author	Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry
author_facet	Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry
author_sort	Amira A. Mohamed
collection	DOAJ
description	Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications.
first_indexed	2024-03-11T05:43:10Z
format	Article
id	doaj.art-b3d1dde733c34c44a6de74e1dfcb7835
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T05:43:10Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-b3d1dde733c34c44a6de74e1dfcb78352023-11-17T16:17:49ZengMDPI AGApplied Sciences2076-34172023-03-01137426110.3390/app13074261Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep LearningAmira A. Mohamed0Amira Eltokhy1Abdelhalim A. Zekry2Department of Electronics Engineering and Communications, Faculty of Engineering, Badr University in Cairo (BUC), Cairo 11829, EgyptRapid Bio-Labs, 10412 Tallinn, EstoniaDepartment of Electronics and Electrical Communications, Faculty of Engineering, Ain Shams University, Cairo 11517, EgyptInstitutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications.https://www.mdpi.com/2076-3417/13/7/4261speaker separationspeaker identificationdeep learningVoIP
spellingShingle	Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning Applied Sciences speaker separation speaker identification deep learning VoIP
title	Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_full	Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_fullStr	Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_full_unstemmed	Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_short	Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_sort	enhanced multiple speakers separation and identification for voip applications using deep learning
topic	speaker separation speaker identification deep learning VoIP
url	https://www.mdpi.com/2076-3417/13/7/4261
work_keys_str_mv	AT amiraamohamed enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning AT amiraeltokhy enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning AT abdelhalimazekry enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning

Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning

Similar Items