Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches

Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. In...

Full description

Bibliographic Details
Main Authors: Nour Aldeen Jalal, Tamer Abdulbaki Alshirbaji, Paul David Docherty, Herag Arabian, Bernhard Laufer, Sabine Krueger-Ziolek, Thomas Neumuth, Knut Moeller
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/4/1958
_version_ 1797618282127163392
author Nour Aldeen Jalal
Tamer Abdulbaki Alshirbaji
Paul David Docherty
Herag Arabian
Bernhard Laufer
Sabine Krueger-Ziolek
Thomas Neumuth
Knut Moeller
author_facet Nour Aldeen Jalal
Tamer Abdulbaki Alshirbaji
Paul David Docherty
Herag Arabian
Bernhard Laufer
Sabine Krueger-Ziolek
Thomas Neumuth
Knut Moeller
author_sort Nour Aldeen Jalal
collection DOAJ
description Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.
first_indexed 2024-03-11T08:10:52Z
format Article
id doaj.art-6fb36159dc204834879b174cee4e9fad
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T08:10:52Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-6fb36159dc204834879b174cee4e9fad2023-11-16T23:08:24ZengMDPI AGSensors1424-82202023-02-01234195810.3390/s23041958Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-ApproachesNour Aldeen Jalal0Tamer Abdulbaki Alshirbaji1Paul David Docherty2Herag Arabian3Bernhard Laufer4Sabine Krueger-Ziolek5Thomas Neumuth6Knut Moeller7Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyInnovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, GermanyInstitute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, GermanyAdapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.https://www.mdpi.com/1424-8220/23/4/1958context-aware systemlaparoscopic video analysissurgical phase recognitionsurgical tool classificationsurgical tool localization
spellingShingle Nour Aldeen Jalal
Tamer Abdulbaki Alshirbaji
Paul David Docherty
Herag Arabian
Bernhard Laufer
Sabine Krueger-Ziolek
Thomas Neumuth
Knut Moeller
Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
Sensors
context-aware system
laparoscopic video analysis
surgical phase recognition
surgical tool classification
surgical tool localization
title Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
title_full Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
title_fullStr Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
title_full_unstemmed Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
title_short Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches
title_sort laparoscopic video analysis using temporal attention and multi feature fusion based approaches
topic context-aware system
laparoscopic video analysis
surgical phase recognition
surgical tool classification
surgical tool localization
url https://www.mdpi.com/1424-8220/23/4/1958
work_keys_str_mv AT nouraldeenjalal laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT tamerabdulbakialshirbaji laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT pauldaviddocherty laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT heragarabian laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT bernhardlaufer laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT sabinekruegerziolek laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT thomasneumuth laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches
AT knutmoeller laparoscopicvideoanalysisusingtemporalattentionandmultifeaturefusionbasedapproaches