Towards open vocabulary learning: a survey

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are...

Full description

Bibliographic Details
Main Authors: Wu, Jianzong, Li, Xiangtai, Xu, Shilin, Yuan, Haobo, Ding, Henghui, Yang, Yibo, Li, Xia, Zhang, Jiangning, Tong, Yunhai, Jiang, Xudong, Ghanem, Bernard, Tao, Dacheng
Other Authors: School of Electrical and Electronic Engineering
Format: Journal Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180101
_version_ 1811678567209107456
author Wu, Jianzong
Li, Xiangtai
Xu, Shilin
Yuan, Haobo
Ding, Henghui
Yang, Yibo
Li, Xia
Zhang, Jiangning
Tong, Yunhai
Jiang, Xudong
Ghanem, Bernard
Tao, Dacheng
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Wu, Jianzong
Li, Xiangtai
Xu, Shilin
Yuan, Haobo
Ding, Henghui
Yang, Yibo
Li, Xia
Zhang, Jiangning
Tong, Yunhai
Jiang, Xudong
Ghanem, Bernard
Tao, Dacheng
author_sort Wu, Jianzong
collection NTU
description In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective than weakly supervised and zero-shot settings. This paper thoroughly reviews open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by juxtaposing open vocabulary learning with analogous concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Subsequently, we examine several pertinent tasks within the realms of segmentation and detection, encompassing long-tail problems, few-shot, and zero-shot settings. As a foundation for our method survey, we first elucidate the fundamental principles of detection and segmentation in close-set scenarios. Next, we examine various contexts where open vocabulary learning is employed, pinpointing recurring design elements and central themes. This is followed by a comparative analysis of recent detection and segmentation methodologies in commonly used datasets and benchmarks. Our review culminates with a synthesis of insights, challenges, and discourse on prospective research trajectories. To our knowledge, this constitutes the inaugural exhaustive literature review on open vocabulary learning.
first_indexed 2024-10-01T02:55:19Z
format Journal Article
id ntu-10356/180101
institution Nanyang Technological University
language English
last_indexed 2024-10-01T02:55:19Z
publishDate 2024
record_format dspace
spelling ntu-10356/1801012024-09-20T15:39:55Z Towards open vocabulary learning: a survey Wu, Jianzong Li, Xiangtai Xu, Shilin Yuan, Haobo Ding, Henghui Yang, Yibo Li, Xia Zhang, Jiangning Tong, Yunhai Jiang, Xudong Ghanem, Bernard Tao, Dacheng School of Electrical and Electronic Engineering Engineering Open vocabulary Scene understanding In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective than weakly supervised and zero-shot settings. This paper thoroughly reviews open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by juxtaposing open vocabulary learning with analogous concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Subsequently, we examine several pertinent tasks within the realms of segmentation and detection, encompassing long-tail problems, few-shot, and zero-shot settings. As a foundation for our method survey, we first elucidate the fundamental principles of detection and segmentation in close-set scenarios. Next, we examine various contexts where open vocabulary learning is employed, pinpointing recurring design elements and central themes. This is followed by a comparative analysis of recent detection and segmentation methodologies in commonly used datasets and benchmarks. Our review culminates with a synthesis of insights, challenges, and discourse on prospective research trajectories. To our knowledge, this constitutes the inaugural exhaustive literature review on open vocabulary learning. Published version This work was supported in part by the National Key Research and Development Program of China under Grant 2023YFC3807600 and in part by the interdisciplinary doctoral under Grant iDoc 2021-360 from the Personalized Health and Related Technologies (PHRT) of the ETH domain. 2024-09-17T02:07:09Z 2024-09-17T02:07:09Z 2024 Journal Article Wu, J., Li, X., Xu, S., Yuan, H., Ding, H., Yang, Y., Li, X., Zhang, J., Tong, Y., Jiang, X., Ghanem, B. & Tao, D. (2024). Towards open vocabulary learning: a survey. IEEE Transactions On Pattern Analysis and Machine Intelligence, 46(7), 5092-5113. https://dx.doi.org/10.1109/TPAMI.2024.3361862 0162-8828 https://hdl.handle.net/10356/180101 10.1109/TPAMI.2024.3361862 38315601 2-s2.0-85184826477 7 46 5092 5113 en IEEE Transactions on Pattern Analysis and Machine Intelligence © 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/. application/pdf
spellingShingle Engineering
Open vocabulary
Scene understanding
Wu, Jianzong
Li, Xiangtai
Xu, Shilin
Yuan, Haobo
Ding, Henghui
Yang, Yibo
Li, Xia
Zhang, Jiangning
Tong, Yunhai
Jiang, Xudong
Ghanem, Bernard
Tao, Dacheng
Towards open vocabulary learning: a survey
title Towards open vocabulary learning: a survey
title_full Towards open vocabulary learning: a survey
title_fullStr Towards open vocabulary learning: a survey
title_full_unstemmed Towards open vocabulary learning: a survey
title_short Towards open vocabulary learning: a survey
title_sort towards open vocabulary learning a survey
topic Engineering
Open vocabulary
Scene understanding
url https://hdl.handle.net/10356/180101
work_keys_str_mv AT wujianzong towardsopenvocabularylearningasurvey
AT lixiangtai towardsopenvocabularylearningasurvey
AT xushilin towardsopenvocabularylearningasurvey
AT yuanhaobo towardsopenvocabularylearningasurvey
AT dinghenghui towardsopenvocabularylearningasurvey
AT yangyibo towardsopenvocabularylearningasurvey
AT lixia towardsopenvocabularylearningasurvey
AT zhangjiangning towardsopenvocabularylearningasurvey
AT tongyunhai towardsopenvocabularylearningasurvey
AT jiangxudong towardsopenvocabularylearningasurvey
AT ghanembernard towardsopenvocabularylearningasurvey
AT taodacheng towardsopenvocabularylearningasurvey