AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval

Artificial intelligence research in natural language processing in the context of poetry struggles with the recognition of holistic content such as poetic symbolism, metaphor, and other fine-grained attributes. Given these challenges, multi-modal image–poetry reasoning and retrieval remain largely u...

Full description

Bibliographic Details
Main Authors:	Muhammad Shahid Jabbar, Jitae Shin, Jun-Dong Cho
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Electronics
Subjects:	image-based poetry retrieval fine-grained attribute recognition accessibility multi-modal attention cross-encoder
Online Access:	https://www.mdpi.com/2079-9292/11/8/1275

_version_	1797446731270455296
author	Muhammad Shahid Jabbar Jitae Shin Jun-Dong Cho
author_facet	Muhammad Shahid Jabbar Jitae Shin Jun-Dong Cho
author_sort	Muhammad Shahid Jabbar
collection	DOAJ
description	Artificial intelligence research in natural language processing in the context of poetry struggles with the recognition of holistic content such as poetic symbolism, metaphor, and other fine-grained attributes. Given these challenges, multi-modal image–poetry reasoning and retrieval remain largely unexplored. Our recent accessibility study indicates that poetry is an effective medium to convey visual artwork attributes for improved artwork appreciation of people with visual impairments. We, therefore, introduce a deep learning approach for the automatic retrieval of poetry suitable to the input images. The recent state-of-the-art CLIP provides a way for multi-modal visual and text features matched using cosine similarity. However, it lacks shared cross-modality attention features to model fine-grained relationships. The proposed approach in this work takes advantage of strong pre-training of the CLIP model and overcomes its limitations by introducing shared attention parameters to better model the fine-grained relationship between both modalities. We test and compare our proposed approach using the expertly annotated MiltiM-Poem dataset, which is considered the largest public image–poetry pair dataset for English poetry. The proposed approach aims to solve the problems of image-based attribute recognition and automatic retrieval for fine-grained poetic verses. The test results reflect that the shared attention parameters alleviate fine-grained attribute recognition, and the proposed approach is a significant step towards automatic multi-modal retrieval for improved artwork appreciation of people with visual impairments.
first_indexed	2024-03-09T13:44:47Z
format	Article
id	doaj.art-f8d78dcd196f457ca8fc36c5b04a8f87
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T13:44:47Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-f8d78dcd196f457ca8fc36c5b04a8f872023-11-30T21:02:31ZengMDPI AGElectronics2079-92922022-04-01118127510.3390/electronics11081275AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry RetrievalMuhammad Shahid Jabbar0Jitae Shin1Jun-Dong Cho2Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, KoreaDepartment of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, KoreaDepartment of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, KoreaArtificial intelligence research in natural language processing in the context of poetry struggles with the recognition of holistic content such as poetic symbolism, metaphor, and other fine-grained attributes. Given these challenges, multi-modal image–poetry reasoning and retrieval remain largely unexplored. Our recent accessibility study indicates that poetry is an effective medium to convey visual artwork attributes for improved artwork appreciation of people with visual impairments. We, therefore, introduce a deep learning approach for the automatic retrieval of poetry suitable to the input images. The recent state-of-the-art CLIP provides a way for multi-modal visual and text features matched using cosine similarity. However, it lacks shared cross-modality attention features to model fine-grained relationships. The proposed approach in this work takes advantage of strong pre-training of the CLIP model and overcomes its limitations by introducing shared attention parameters to better model the fine-grained relationship between both modalities. We test and compare our proposed approach using the expertly annotated MiltiM-Poem dataset, which is considered the largest public image–poetry pair dataset for English poetry. The proposed approach aims to solve the problems of image-based attribute recognition and automatic retrieval for fine-grained poetic verses. The test results reflect that the shared attention parameters alleviate fine-grained attribute recognition, and the proposed approach is a significant step towards automatic multi-modal retrieval for improved artwork appreciation of people with visual impairments.https://www.mdpi.com/2079-9292/11/8/1275image-based poetry retrievalfine-grained attribute recognitionaccessibilitymulti-modal attentioncross-encoder
spellingShingle	Muhammad Shahid Jabbar Jitae Shin Jun-Dong Cho AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval Electronics image-based poetry retrieval fine-grained attribute recognition accessibility multi-modal attention cross-encoder
title	AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval
title_full	AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval
title_fullStr	AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval
title_full_unstemmed	AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval
title_short	AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval
title_sort	ai ekphrasis multi modal learning with foundation models for fine grained poetry retrieval
topic	image-based poetry retrieval fine-grained attribute recognition accessibility multi-modal attention cross-encoder
url	https://www.mdpi.com/2079-9292/11/8/1275
work_keys_str_mv	AT muhammadshahidjabbar aiekphrasismultimodallearningwithfoundationmodelsforfinegrainedpoetryretrieval AT jitaeshin aiekphrasismultimodallearningwithfoundationmodelsforfinegrainedpoetryretrieval AT jundongcho aiekphrasismultimodallearningwithfoundationmodelsforfinegrainedpoetryretrieval

AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval

Similar Items