Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation

During phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker’s vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker’s physical state can affect...

Full description

Bibliographic Details
Main Authors: Wayne Zhao, Rita Singh
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/25/7/1039
_version_ 1827733065780166656
author Wayne Zhao
Rita Singh
author_facet Wayne Zhao
Rita Singh
author_sort Wayne Zhao
collection DOAJ
description During phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker’s vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker’s physical state can affect voice production and alter these oscillatory patterns. Measuring these can be valuable in developing computational tools that analyze voice to infer the speaker’s state. Traditionally, vocal fold oscillations (VFOs) are measured directly using physical devices in clinical settings. In this paper, we propose a novel analysis-by-synthesis approach that allows us to infer the VFOs directly from recorded speech signals on an individualized, speaker-by-speaker basis. The approach, called the ADLES-VFT algorithm, is proposed in the context of a joint model that combines a phonation model (with a glottal flow waveform as the output) and a vocal tract acoustic wave propagation model such that the output of the joint model is an estimated waveform. The ADLES-VFT algorithm is a forward-backward algorithm which minimizes the error between the recorded waveform and the output of this joint model to estimate its parameters. Once estimated, these parameter values are used in conjunction with a phonation model to obtain its solutions. Since the parameters correlate with the physical properties of the vocal folds of the speaker, model solutions obtained using them represent the individualized VFOs for each speaker. The approach is flexible and can be applied to various phonation models. In addition to presenting the methodology, we show how the VFOs can be quantified from a dynamical systems perspective for classification purposes. Mathematical derivations are provided in an appendix for better readability.
first_indexed 2024-03-11T01:05:47Z
format Article
id doaj.art-8aced2b1055a4013b73863f0716d713e
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-11T01:05:47Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-8aced2b1055a4013b73863f0716d713e2023-11-18T19:13:51ZengMDPI AGEntropy1099-43002023-07-01257103910.3390/e25071039Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of PhonationWayne Zhao0Rita Singh1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USASchool of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USADuring phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker’s vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker’s physical state can affect voice production and alter these oscillatory patterns. Measuring these can be valuable in developing computational tools that analyze voice to infer the speaker’s state. Traditionally, vocal fold oscillations (VFOs) are measured directly using physical devices in clinical settings. In this paper, we propose a novel analysis-by-synthesis approach that allows us to infer the VFOs directly from recorded speech signals on an individualized, speaker-by-speaker basis. The approach, called the ADLES-VFT algorithm, is proposed in the context of a joint model that combines a phonation model (with a glottal flow waveform as the output) and a vocal tract acoustic wave propagation model such that the output of the joint model is an estimated waveform. The ADLES-VFT algorithm is a forward-backward algorithm which minimizes the error between the recorded waveform and the output of this joint model to estimate its parameters. Once estimated, these parameter values are used in conjunction with a phonation model to obtain its solutions. Since the parameters correlate with the physical properties of the vocal folds of the speaker, model solutions obtained using them represent the individualized VFOs for each speaker. The approach is flexible and can be applied to various phonation models. In addition to presenting the methodology, we show how the VFOs can be quantified from a dynamical systems perspective for classification purposes. Mathematical derivations are provided in an appendix for better readability.https://www.mdpi.com/1099-4300/25/7/1039vocal fold oscillationphonation modelsdynamical systemsparameter estimationvoice profiling
spellingShingle Wayne Zhao
Rita Singh
Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
Entropy
vocal fold oscillation
phonation models
dynamical systems
parameter estimation
voice profiling
title Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
title_full Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
title_fullStr Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
title_full_unstemmed Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
title_short Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
title_sort deriving vocal fold oscillation information from recorded voice signals using models of phonation
topic vocal fold oscillation
phonation models
dynamical systems
parameter estimation
voice profiling
url https://www.mdpi.com/1099-4300/25/7/1039
work_keys_str_mv AT waynezhao derivingvocalfoldoscillationinformationfromrecordedvoicesignalsusingmodelsofphonation
AT ritasingh derivingvocalfoldoscillationinformationfromrecordedvoicesignalsusingmodelsofphonation