A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos

The determination of the potential role and advantages of artificial intelligence-based models in the field of surgery remains uncertain. This research marks an initial stride towards creating a multimodal model, inspired by the Video-Audio-Text Transformer, that aims to reduce negative occurrences...

Full description

Bibliographic Details
Main Authors:	Rahib H. Abiyev, Mohamad Ziad Altabel, Manal Darwish, Abdulkader Helwan
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Diagnostics
Subjects:	transformer laparoscopic videos ViT BERT transformer encoders text and image embedding
Online Access:	https://www.mdpi.com/2075-4418/14/7/681

Internet

https://www.mdpi.com/2075-4418/14/7/681

A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos

Internet

Similar Items