An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles

The marine environment presents a unique set of challenges for human–robot interaction. Communicating with gestures is a common way for interacting between the diver and autonomous underwater vehicles (AUVs). However, underwater gesture recognition is a challenging visual task for AUVs due to light...

Full description

Bibliographic Details
Main Authors: Yongji Zhang, Yu Jiang, Hong Qi, Minghao Zhao, Yuehang Wang, Kai Wang, Fenglin Wei
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/1/197
_version_ 1797439653467389952
author Yongji Zhang
Yu Jiang
Hong Qi
Minghao Zhao
Yuehang Wang
Kai Wang
Fenglin Wei
author_facet Yongji Zhang
Yu Jiang
Hong Qi
Minghao Zhao
Yuehang Wang
Kai Wang
Fenglin Wei
author_sort Yongji Zhang
collection DOAJ
description The marine environment presents a unique set of challenges for human–robot interaction. Communicating with gestures is a common way for interacting between the diver and autonomous underwater vehicles (AUVs). However, underwater gesture recognition is a challenging visual task for AUVs due to light refraction and wavelength color attenuation issues. Current gesture recognition methods classify the whole image directly or locate the hand position first and then classify the hand features. Among these purely visual approaches, textual information is largely ignored. This paper proposes a visual–textual model for underwater hand gesture recognition (VT-UHGR). The VT-UHGR model encodes the underwater diver’s image as visual features, the category text as textual features, and generates visual–textual features through multimodal interactions. We guide AUVs to use image–text matching for learning and inference. The proposed method achieves better performance than most existing purely visual methods on the dataset CADDY, demonstrating the effectiveness of using textual patterns for underwater gesture recognition.
first_indexed 2024-03-09T11:56:14Z
format Article
id doaj.art-108e43af8e7941739a0324ee7c6397c2
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T11:56:14Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-108e43af8e7941739a0324ee7c6397c22023-11-30T23:08:05ZengMDPI AGSensors1424-82202022-12-0123119710.3390/s23010197An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater VehiclesYongji Zhang0Yu Jiang1Hong Qi2Minghao Zhao3Yuehang Wang4Kai Wang5Fenglin Wei6College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaCollege of Computer Science and Technology, Jilin University, Changchun 130012, ChinaThe marine environment presents a unique set of challenges for human–robot interaction. Communicating with gestures is a common way for interacting between the diver and autonomous underwater vehicles (AUVs). However, underwater gesture recognition is a challenging visual task for AUVs due to light refraction and wavelength color attenuation issues. Current gesture recognition methods classify the whole image directly or locate the hand position first and then classify the hand features. Among these purely visual approaches, textual information is largely ignored. This paper proposes a visual–textual model for underwater hand gesture recognition (VT-UHGR). The VT-UHGR model encodes the underwater diver’s image as visual features, the category text as textual features, and generates visual–textual features through multimodal interactions. We guide AUVs to use image–text matching for learning and inference. The proposed method achieves better performance than most existing purely visual methods on the dataset CADDY, demonstrating the effectiveness of using textual patterns for underwater gesture recognition.https://www.mdpi.com/1424-8220/23/1/197autonomous underwater vehicleunderwater human–robot interactiongesture recognitionvisual–textual association
spellingShingle Yongji Zhang
Yu Jiang
Hong Qi
Minghao Zhao
Yuehang Wang
Kai Wang
Fenglin Wei
An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
Sensors
autonomous underwater vehicle
underwater human–robot interaction
gesture recognition
visual–textual association
title An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
title_full An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
title_fullStr An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
title_full_unstemmed An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
title_short An Underwater Human–Robot Interaction Using a Visual–Textual Model for Autonomous Underwater Vehicles
title_sort underwater human robot interaction using a visual textual model for autonomous underwater vehicles
topic autonomous underwater vehicle
underwater human–robot interaction
gesture recognition
visual–textual association
url https://www.mdpi.com/1424-8220/23/1/197
work_keys_str_mv AT yongjizhang anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT yujiang anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT hongqi anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT minghaozhao anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT yuehangwang anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT kaiwang anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT fenglinwei anunderwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT yongjizhang underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT yujiang underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT hongqi underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT minghaozhao underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT yuehangwang underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT kaiwang underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles
AT fenglinwei underwaterhumanrobotinteractionusingavisualtextualmodelforautonomousunderwatervehicles