Open-vocabulary SAM: segment and recognize twenty-thousand classes interactively
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into...
Main Authors: | Yuan, Haobo, Li, Xiangtai, Zhou, Chong, Li, Yining, Chen, Kai, Loy, Chen Change |
---|---|
Other Authors: | College of Computing and Data Science |
Format: | Conference Paper |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180250 http://arxiv.org/abs/2401.02955v2 |
Similar Items
-
Towards open vocabulary learning: a survey
by: Wu, Jianzong, et al.
Published: (2024) -
Sketch-and-Fill Network for Semantic Segmentation
by: Youngsaeng Jin, et al.
Published: (2021-01-01) -
A novel dilated convolutional neural network model for road scene segmentation
by: Yachao Zhang, et al.
Published: (2022-01-01) -
Tencent AVS: A Holistic Ads Video Dataset for Multi-Modal Scene Segmentation
by: Jie Jiang, et al.
Published: (2022-01-01) -
A Hierarchical Feature Extraction Network for Fast Scene Segmentation
by: Liu Miao, et al.
Published: (2021-11-01)