Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integrati...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/7/4261 |
_version_ | 1797608417929461760 |
---|---|
author | Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry |
author_facet | Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry |
author_sort | Amira A. Mohamed |
collection | DOAJ |
description | Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications. |
first_indexed | 2024-03-11T05:43:10Z |
format | Article |
id | doaj.art-b3d1dde733c34c44a6de74e1dfcb7835 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T05:43:10Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-b3d1dde733c34c44a6de74e1dfcb78352023-11-17T16:17:49ZengMDPI AGApplied Sciences2076-34172023-03-01137426110.3390/app13074261Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep LearningAmira A. Mohamed0Amira Eltokhy1Abdelhalim A. Zekry2Department of Electronics Engineering and Communications, Faculty of Engineering, Badr University in Cairo (BUC), Cairo 11829, EgyptRapid Bio-Labs, 10412 Tallinn, EstoniaDepartment of Electronics and Electrical Communications, Faculty of Engineering, Ain Shams University, Cairo 11517, EgyptInstitutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications.https://www.mdpi.com/2076-3417/13/7/4261speaker separationspeaker identificationdeep learningVoIP |
spellingShingle | Amira A. Mohamed Amira Eltokhy Abdelhalim A. Zekry Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning Applied Sciences speaker separation speaker identification deep learning VoIP |
title | Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning |
title_full | Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning |
title_fullStr | Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning |
title_full_unstemmed | Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning |
title_short | Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning |
title_sort | enhanced multiple speakers separation and identification for voip applications using deep learning |
topic | speaker separation speaker identification deep learning VoIP |
url | https://www.mdpi.com/2076-3417/13/7/4261 |
work_keys_str_mv | AT amiraamohamed enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning AT amiraeltokhy enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning AT abdelhalimazekry enhancedmultiplespeakersseparationandidentificationforvoipapplicationsusingdeeplearning |