Open-vocabulary SAM: segment and recognize twenty-thousand classes interactively
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into...
Asıl Yazarlar: | , , , , , |
---|---|
Diğer Yazarlar: | |
Materyal Türü: | Conference Paper |
Dil: | English |
Baskı/Yayın Bilgisi: |
2024
|
Konular: | |
Online Erişim: | https://hdl.handle.net/10356/180250 http://arxiv.org/abs/2401.02955v2 |