Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning

Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integrati...

Full description

Bibliographic Details
Main Authors: Amira A. Mohamed, Amira Eltokhy, Abdelhalim A. Zekry
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/7/4261
_version_ 1797608417929461760
author Amira A. Mohamed
Amira Eltokhy
Abdelhalim A. Zekry
author_facet Amira A. Mohamed
Amira Eltokhy
Abdelhalim A. Zekry
author_sort Amira A. Mohamed
collection DOAJ
description Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications.
first_indexed 2024-03-11T05:43:10Z
format Article
id doaj.art-b3d1dde733c34c44a6de74e1dfcb7835
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T05:43:10Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-b3d1dde733c34c44a6de74e1dfcb78352023-11-17T16:17:49ZengMDPI AGApplied Sciences2076-34172023-03-01137426110.3390/app13074261Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep LearningAmira A. Mohamed0Amira Eltokhy1Abdelhalim A. Zekry2Department of Electronics Engineering and Communications, Faculty of Engineering, Badr University in Cairo (BUC), Cairo 11829, EgyptRapid Bio-Labs, 10412 Tallinn, EstoniaDepartment of Electronics and Electrical Communications, Faculty of Engineering, Ain Shams University, Cairo 11517, EgyptInstitutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications.https://www.mdpi.com/2076-3417/13/7/4261speaker separationspeaker identificationdeep learningVoIP
spellingShingle Amira A. Mohamed
Amira Eltokhy
Abdelhalim A. Zekry
Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
Applied Sciences
speaker separation
speaker identification
deep learning
VoIP
title Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_full Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_fullStr Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_full_unstemmed Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_short Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
title_sort enhanced multiple speakers separation and identification for voip applications using deep learning
topic speaker separation
speaker identification
deep learning
VoIP
url https://www.mdpi.com/2076-3417/13/7/4261
work_keys_str_mv AT amiraamohamed enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning
AT amiraeltokhy enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning
AT abdelhalimazekry enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning