Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension

Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to complet...

Full description

Bibliographic Details
Main Authors:	Yaodong Wang, Lili Yue, Maoqing Li
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Electronics
Subjects:	phrase comprehension reinforcement learning class activation mapping
Online Access:	https://www.mdpi.com/2079-9292/13/5/898

_version_	1797264625017815040
author	Yaodong Wang Lili Yue Maoqing Li
author_facet	Yaodong Wang Lili Yue Maoqing Li
author_sort	Yaodong Wang
collection	DOAJ
description	Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method.
first_indexed	2024-04-25T00:31:52Z
format	Article
id	doaj.art-1acabcf80dc840a5985a371dee175309
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-04-25T00:31:52Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-1acabcf80dc840a5985a371dee1753092024-03-12T16:42:31ZengMDPI AGElectronics2079-92922024-02-0113589810.3390/electronics13050898Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase ComprehensionYaodong Wang0Lili Yue1Maoqing Li2School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaPhrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method.https://www.mdpi.com/2079-9292/13/5/898phrase comprehensionreinforcement learningclass activation mapping
spellingShingle	Yaodong Wang Lili Yue Maoqing Li Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension Electronics phrase comprehension reinforcement learning class activation mapping
title	Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_full	Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_fullStr	Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_full_unstemmed	Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_short	Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_sort	cascaded searching reinforcement learning agent for proposal free weakly supervised phrase comprehension
topic	phrase comprehension reinforcement learning class activation mapping
url	https://www.mdpi.com/2079-9292/13/5/898
work_keys_str_mv	AT yaodongwang cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension AT liliyue cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension AT maoqingli cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension

Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension

Similar Items