Open-vocabulary SAM: segment and recognize twenty-thousand classes interactively

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar: Yuan, Haobo, Li, Xiangtai, Zhou, Chong, Li, Yining, Chen, Kai, Loy, Chen Change
Diğer Yazarlar: College of Computing and Data Science
Materyal Türü: Conference Paper
Dil:English
Baskı/Yayın Bilgisi: 2024
Konular:
Online Erişim:https://hdl.handle.net/10356/180250
http://arxiv.org/abs/2401.02955v2