Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning

Transformer-based approaches have shown good results in image captioning tasks. However, current approaches have a limitation in generating text from global features of an entire image. Therefore, we propose novel methods for generating better image captioning as follows: (1) The Global-Local Visual...

Full description

Bibliographic Details
Main Authors: Hojun Lee, Hyunjun Cho, Jieun Park, Jinyeong Chae, Jihie Kim
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/4/1429