Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model

Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of sp...

Full description

Bibliographic Details
Main Authors:	Subin Erattakulangara, Karthika Kelat, David Meyer, Sarv Priya, Sajan Goud Lingala
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Bioengineering
Subjects:	dynamic speech MRI articulator segmentation protocol adaptiveness transfer learning
Online Access:	https://www.mdpi.com/2306-5354/10/5/623

_version_	1797601032144945152
author	Subin Erattakulangara Karthika Kelat David Meyer Sarv Priya Sajan Goud Lingala
author_facet	Subin Erattakulangara Karthika Kelat David Meyer Sarv Priya Sajan Goud Lingala
author_sort	Subin Erattakulangara
collection	DOAJ
description	Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80–100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.
first_indexed	2024-03-11T03:55:43Z
format	Article
id	doaj.art-71da49e14e39455985a1e92644834b66
institution	Directory Open Access Journal
issn	2306-5354
language	English
last_indexed	2024-03-11T03:55:43Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Bioengineering
spelling	doaj.art-71da49e14e39455985a1e92644834b662023-11-18T00:32:12ZengMDPI AGBioengineering2306-53542023-05-0110562310.3390/bioengineering10050623Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET ModelSubin Erattakulangara0Karthika Kelat1David Meyer2Sarv Priya3Sajan Goud Lingala4Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USARoy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USAJanette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USADepartment of Radiology, University of Iowa, Iowa City, IA 52242, USARoy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USADynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80–100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.https://www.mdpi.com/2306-5354/10/5/623dynamic speech MRIarticulator segmentationprotocol adaptivenesstransfer learning
spellingShingle	Subin Erattakulangara Karthika Kelat David Meyer Sarv Priya Sajan Goud Lingala Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model Bioengineering dynamic speech MRI articulator segmentation protocol adaptiveness transfer learning
title	Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
title_full	Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
title_fullStr	Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
title_full_unstemmed	Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
title_short	Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
title_sort	automatic multiple articulator segmentation in dynamic speech mri using a protocol adaptive stacked transfer learning u net model
topic	dynamic speech MRI articulator segmentation protocol adaptiveness transfer learning
url	https://www.mdpi.com/2306-5354/10/5/623
work_keys_str_mv	AT subinerattakulangara automaticmultiplearticulatorsegmentationindynamicspeechmriusingaprotocoladaptivestackedtransferlearningunetmodel AT karthikakelat automaticmultiplearticulatorsegmentationindynamicspeechmriusingaprotocoladaptivestackedtransferlearningunetmodel AT davidmeyer automaticmultiplearticulatorsegmentationindynamicspeechmriusingaprotocoladaptivestackedtransferlearningunetmodel AT sarvpriya automaticmultiplearticulatorsegmentationindynamicspeechmriusingaprotocoladaptivestackedtransferlearningunetmodel AT sajangoudlingala automaticmultiplearticulatorsegmentationindynamicspeechmriusingaprotocoladaptivestackedtransferlearningunetmodel

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model

Similar Items