Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to complet...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/13/5/898 |
_version_ | 1797264625017815040 |
---|---|
author | Yaodong Wang Lili Yue Maoqing Li |
author_facet | Yaodong Wang Lili Yue Maoqing Li |
author_sort | Yaodong Wang |
collection | DOAJ |
description | Phrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method. |
first_indexed | 2024-04-25T00:31:52Z |
format | Article |
id | doaj.art-1acabcf80dc840a5985a371dee175309 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-04-25T00:31:52Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-1acabcf80dc840a5985a371dee1753092024-03-12T16:42:31ZengMDPI AGElectronics2079-92922024-02-0113589810.3390/electronics13050898Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase ComprehensionYaodong Wang0Lili Yue1Maoqing Li2School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaSchool of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, ChinaPhrase comprehension (PC) aims to locate a specific object in an image according to a given linguistic query. The existing PC methods work in either a fully supervised or proposal-based weakly supervised manner, which rely explicitly or implicitly on expensive region annotations. In order to completely remove the dependence on the supervised region information, this paper proposes to address PC in a proposal-free weakly supervised training paradigm. To this end, we developed a novel cascaded searching reinforcement learning agent (CSRLA). Concretely, we first leveraged a visual language pre-trained model to generate a visual–textual cross-modal attention heatmap. Accordingly, a coarse salient initial region of the referential target was located. Then, we formulated the visual object grounding as a Markov decision process (MDP) in a reinforcement learning framework, where an agent was trained to iteratively search for the target’s complete region from the salient local region. Additionally, we developed a novel confidence discrimination reward function (ConDis_R) to constrain the model to search for a complete and exclusive object region. The experimental results on three benchmark datasets of Refcoco, Refcoco+, and Refcocog demonstrated the effectiveness of our proposed method.https://www.mdpi.com/2079-9292/13/5/898phrase comprehensionreinforcement learningclass activation mapping |
spellingShingle | Yaodong Wang Lili Yue Maoqing Li Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension Electronics phrase comprehension reinforcement learning class activation mapping |
title | Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension |
title_full | Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension |
title_fullStr | Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension |
title_full_unstemmed | Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension |
title_short | Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension |
title_sort | cascaded searching reinforcement learning agent for proposal free weakly supervised phrase comprehension |
topic | phrase comprehension reinforcement learning class activation mapping |
url | https://www.mdpi.com/2079-9292/13/5/898 |
work_keys_str_mv | AT yaodongwang cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension AT liliyue cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension AT maoqingli cascadedsearchingreinforcementlearningagentforproposalfreeweaklysupervisedphrasecomprehension |