Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension

Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to complet...

Full description

Bibliographic Details
Main Authors: Yaodong Wang, Lili Yue, Maoqing Li
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/13/5/898
_version_ 1797264625017815040
author Yaodong Wang
Lili Yue
Maoqing Li
author_facet Yaodong Wang
Lili Yue
Maoqing Li
author_sort Yaodong Wang
collection DOAJ
description Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method.
first_indexed 2024-04-25T00:31:52Z
format Article
id doaj.art-1acabcf80dc840a5985a371dee175309
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-04-25T00:31:52Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-1acabcf80dc840a5985a371dee1753092024-03-12T16:42:31ZengMDPI AGElectronics2079-92922024-02-0113589810.3390/electronics13050898Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase ComprehensionYaodong Wang0Lili Yue1Maoqing Li2School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaPhrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method.https://www.mdpi.com/2079-9292/13/5/898phrase comprehensionreinforcement learningclass activation mapping
spellingShingle Yaodong Wang
Lili Yue
Maoqing Li
Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
Electronics
phrase comprehension
reinforcement learning
class activation mapping
title Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_full Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_fullStr Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_full_unstemmed Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_short Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
title_sort cascaded searching reinforcement learning agent for proposal free weakly supervised phrase comprehension
topic phrase comprehension
reinforcement learning
class activation mapping
url https://www.mdpi.com/2079-9292/13/5/898
work_keys_str_mv AT yaodongwang cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension
AT liliyue cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension
AT maoqingli cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension