Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks

Background: Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallo...

Full description

Bibliographic Details
Main Authors:	Ki-Sun Lee, Eunyoung Lee, Bareun Choi, Sung-Bom Pyun
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Diagnostics
Subjects:	videofluoroscopic swallowing study action recognition deep learning convolutional neural network transfer learning
Online Access:	https://www.mdpi.com/2075-4418/11/2/300

_version_	1797396645459001344
author	Ki-Sun Lee Eunyoung Lee Bareun Choi Sung-Bom Pyun
author_facet	Ki-Sun Lee Eunyoung Lee Bareun Choi Sung-Bom Pyun
author_sort	Ki-Sun Lee
collection	DOAJ
description	Background: Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallowing abnormality in VFSS images. Therefore, this study aims to present a deep leaning-based approach using transfer learning with a convolutional neural network (CNN) that automatically annotates pharyngeal phase frames in untrimmed VFSS videos such that frames need not be searched manually. Methods: To determine whether the image frame in the VFSS video is in the pharyngeal phase, a single-frame baseline architecture based the deep CNN framework is used and a transfer learning technique with fine-tuning is applied. Results: Compared with all experimental CNN models, that fine-tuned with two blocks of the VGG-16 (VGG16-FT5) model achieved the highest performance in terms of recognizing the frame of pharyngeal phase, that is, the accuracy of 93.20 (±1.25)%, sensitivity of 84.57 (±5.19)%, specificity of 94.36 (±1.21)%, AUC of 0.8947 (±0.0269) and Kappa of 0.7093 (±0.0488). Conclusions: Using appropriate and fine-tuning techniques and explainable deep learning techniques such as grad CAM, this study shows that the proposed single-frame-baseline-architecture-based deep CNN framework can yield high performances in the full automation of VFSS video analysis.
first_indexed	2024-03-09T00:54:22Z
format	Article
id	doaj.art-4fc19f13fd2a40aa8fcc4896836a2947
institution	Directory Open Access Journal
issn	2075-4418
language	English
last_indexed	2024-03-09T00:54:22Z
publishDate	2021-02-01
publisher	MDPI AG
record_format	Article
series	Diagnostics
spelling	doaj.art-4fc19f13fd2a40aa8fcc4896836a29472023-12-11T16:57:52ZengMDPI AGDiagnostics2075-44182021-02-0111230010.3390/diagnostics11020300Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural NetworksKi-Sun Lee0Eunyoung Lee1Bareun Choi2Sung-Bom Pyun3Medical Science Research Center, Ansan Hospital, Korea University College of Medicine, Ansan-si 15355, KoreaDepartment of Physical Medicine and Rehabilitation, Anam Hospital, Korea University College of Medicine, Seoul 02841, KoreaDepartment of Physical Medicine and Rehabilitation, Anam Hospital, Korea University College of Medicine, Seoul 02841, KoreaDepartment of Physical Medicine and Rehabilitation, Anam Hospital, Korea University College of Medicine, Seoul 02841, KoreaBackground: Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallowing abnormality in VFSS images. Therefore, this study aims to present a deep leaning-based approach using transfer learning with a convolutional neural network (CNN) that automatically annotates pharyngeal phase frames in untrimmed VFSS videos such that frames need not be searched manually. Methods: To determine whether the image frame in the VFSS video is in the pharyngeal phase, a single-frame baseline architecture based the deep CNN framework is used and a transfer learning technique with fine-tuning is applied. Results: Compared with all experimental CNN models, that fine-tuned with two blocks of the VGG-16 (VGG16-FT5) model achieved the highest performance in terms of recognizing the frame of pharyngeal phase, that is, the accuracy of 93.20 (±1.25)%, sensitivity of 84.57 (±5.19)%, specificity of 94.36 (±1.21)%, AUC of 0.8947 (±0.0269) and Kappa of 0.7093 (±0.0488). Conclusions: Using appropriate and fine-tuning techniques and explainable deep learning techniques such as grad CAM, this study shows that the proposed single-frame-baseline-architecture-based deep CNN framework can yield high performances in the full automation of VFSS video analysis.https://www.mdpi.com/2075-4418/11/2/300videofluoroscopic swallowing studyaction recognitiondeep learningconvolutional neural networktransfer learning
spellingShingle	Ki-Sun Lee Eunyoung Lee Bareun Choi Sung-Bom Pyun Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks Diagnostics videofluoroscopic swallowing study action recognition deep learning convolutional neural network transfer learning
title	Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks
title_full	Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks
title_fullStr	Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks
title_full_unstemmed	Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks
title_short	Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks
title_sort	automatic pharyngeal phase recognition in untrimmed videofluoroscopic swallowing study using transfer learning with deep convolutional neural networks
topic	videofluoroscopic swallowing study action recognition deep learning convolutional neural network transfer learning
url	https://www.mdpi.com/2075-4418/11/2/300
work_keys_str_mv	AT kisunlee automaticpharyngealphaserecognitioninuntrimmedvideofluoroscopicswallowingstudyusingtransferlearningwithdeepconvolutionalneuralnetworks AT eunyounglee automaticpharyngealphaserecognitioninuntrimmedvideofluoroscopicswallowingstudyusingtransferlearningwithdeepconvolutionalneuralnetworks AT bareunchoi automaticpharyngealphaserecognitioninuntrimmedvideofluoroscopicswallowingstudyusingtransferlearningwithdeepconvolutionalneuralnetworks AT sungbompyun automaticpharyngealphaserecognitioninuntrimmedvideofluoroscopicswallowingstudyusingtransferlearningwithdeepconvolutionalneuralnetworks

Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks

Similar Items