Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Speech quality and speech intelligibility can vary dramatically across the wide range of currently available telecommunications systems, devices, and operating environments. This creates a strong demand for efficient real-time measurements of quality and intelligibility. Wideband Audio Waveform Eval...

Full description

Bibliographic Details
Main Authors:	Andrew A. Catellier, Stephen D. Voran
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Convolutional neural networks no-reference objective estimator speech intelligibility speech quality subjective testing wideband speech
Online Access:	https://ieeexplore.ieee.org/document/10309306/

_version_	1797627798341287936
author	Andrew A. Catellier Stephen D. Voran
author_facet	Andrew A. Catellier Stephen D. Voran
author_sort	Andrew A. Catellier
collection	DOAJ
description	Speech quality and speech intelligibility can vary dramatically across the wide range of currently available telecommunications systems, devices, and operating environments. This creates a strong demand for efficient real-time measurements of quality and intelligibility. Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks (CNNs) that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our initial 2020 WAWEnet publication introduces four WAWEnets and each emulates the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values with per-segment correlations in the range of 0.92 to 0.96. We create a second network that additionally tracks four subjective speech quality dimensions. We offer a third network that focuses on just subjective quality scores and achieves a per-segment correlation of 0.97. The performance of our WAWEnet architecture compares favorably to models with orders-of-magnitude more parameters and computational complexity. This work has leveraged 334 hours of speech in 13 languages, more than two million full-reference target values, and more than 93,000 subjective mean opinion scores. We also interpret the operation of WAWEnets and identify the key to their operation using the language of signal processing: ReLUs strategically move spectral information from non-DC components into the DC component. The DC values of 96 output signals define a vector in a 96-D latent space, and this vector is then mapped to a quality or intelligibility value for the input waveform.
first_indexed	2024-03-11T10:30:36Z
format	Article
id	doaj.art-567f896612f444ef88ee1d53f74f5f85
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T10:30:36Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-567f896612f444ef88ee1d53f74f5f852023-11-15T00:00:35ZengIEEEIEEE Access2169-35362023-01-011112557612559210.1109/ACCESS.2023.333064010309306Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech QualitiesAndrew A. Catellier0https://orcid.org/0000-0002-5850-555XStephen D. Voran1https://orcid.org/0000-0001-7840-8848Institute for Telecommunication Sciences, Boulder, CO, USAInstitute for Telecommunication Sciences, Boulder, CO, USASpeech quality and speech intelligibility can vary dramatically across the wide range of currently available telecommunications systems, devices, and operating environments. This creates a strong demand for efficient real-time measurements of quality and intelligibility. Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks (CNNs) that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our initial 2020 WAWEnet publication introduces four WAWEnets and each emulates the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values with per-segment correlations in the range of 0.92 to 0.96. We create a second network that additionally tracks four subjective speech quality dimensions. We offer a third network that focuses on just subjective quality scores and achieves a per-segment correlation of 0.97. The performance of our WAWEnet architecture compares favorably to models with orders-of-magnitude more parameters and computational complexity. This work has leveraged 334 hours of speech in 13 languages, more than two million full-reference target values, and more than 93,000 subjective mean opinion scores. We also interpret the operation of WAWEnets and identify the key to their operation using the language of signal processing: ReLUs strategically move spectral information from non-DC components into the DC component. The DC values of 96 output signals define a vector in a 96-D latent space, and this vector is then mapped to a quality or intelligibility value for the input waveform.https://ieeexplore.ieee.org/document/10309306/Convolutional neural networksno-reference objective estimatorspeech intelligibilityspeech qualitysubjective testingwideband speech
spellingShingle	Andrew A. Catellier Stephen D. Voran Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities IEEE Access Convolutional neural networks no-reference objective estimator speech intelligibility speech quality subjective testing wideband speech
title	Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
title_full	Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
title_fullStr	Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
title_full_unstemmed	Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
title_short	Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
title_sort	wideband audio waveform evaluation networks efficient accurate estimation of speech qualities
topic	Convolutional neural networks no-reference objective estimator speech intelligibility speech quality subjective testing wideband speech
url	https://ieeexplore.ieee.org/document/10309306/
work_keys_str_mv	AT andrewacatellier widebandaudiowaveformevaluationnetworksefficientaccurateestimationofspeechqualities AT stephendvoran widebandaudiowaveformevaluationnetworksefficientaccurateestimationofspeechqualities

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Similar Items