Processing Methods for the Detection of Landmark Acoustic Cues

This paper presents work on an aspect of a new speech analysis system for lexical access, which is based on the concept of individual acoustic cues in the speech signal such as Landmarks, which are abrupt changes in the spectrum due to articulatory events associated with vowels and consonants. It pr...

Descrizione completa

Dettagli Bibliografici
Autore principale: Shi, Belinda
Altri autori: Shattuck-Hufnagel, Stefanie
Natura: Tesi
Pubblicazione: Massachusetts Institute of Technology 2022
Accesso online:https://hdl.handle.net/1721.1/143187
Descrizione
Riassunto:This paper presents work on an aspect of a new speech analysis system for lexical access, which is based on the concept of individual acoustic cues in the speech signal such as Landmarks, which are abrupt changes in the spectrum due to articulatory events associated with vowels and consonants. It provides an organized process that can easily be repeated and modified to be able to create an accurate and efficient detection module for landmark cues in speech files. The paper begins by examining patterns in the speech signal that may indicate the presence of vowel landmark cues, before proposing an algorithm that can predict the locations of vowel landmarks based on these observations. Then, it maps out a generalized system of steps needed to construct modules for detecting landmark acoustic cues, which involves extracting speech related measurements, processing them to accentuate certain characteristics, then using both speech production knowledge and mathematical analysis to determine which measurements are good indicators of certain acoustic cues. Finally, Gaussian Mixture Models using the selected raw and processed measurements are trained in order to efficiently and accurately distinguish landmark cues. These steps are applied to Vowel and Glide landmarks to develop a module that can distinguish them from other landmark cues in a speech signal. Development of this module provides a critical step in the development of a cue-based speech recognition system which can model human speech perception.