Open-vocabulary SAM: segment and recognize twenty-thousand classes interactively

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into...

ver descrição completa

Detalhes bibliográficos
Main Authors:	Yuan, Haobo, Li, Xiangtai, Zhou, Chong, Li, Yining, Chen, Kai, Loy, Chen Change
Outros Autores:	College of Computing and Data Science
Formato:	Conference Paper
Idioma:	English
Publicado em:	2024
Assuntos:	Computer and Information Science Scene understanding Promptable segmentation
Acesso em linha:	https://hdl.handle.net/10356/180250 http://arxiv.org/abs/2401.02955v2

Internet

https://hdl.handle.net/10356/180250
http://arxiv.org/abs/2401.02955v2

Open-vocabulary SAM: segment and recognize twenty-thousand classes interactively

Internet

Registos relacionados