What does CLIP know about a red circle? Visual prompt engineering for VLMs
Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation. Despite that, their capabilities for solving novel discriminative tasks via prompting fall behind those of lar...
Κύριοι συγγραφείς: | Shtedritski, A, Rupprecht, C, Vedaldi, A |
---|---|
Μορφή: | Internet publication |
Γλώσσα: | English |
Έκδοση: |
2023
|
Παρόμοια τεκμήρια
Παρόμοια τεκμήρια
-
What does CLIP know about a red circle? visual prompt engineering for VLMs
ανά: Shtedritski, A, κ.ά.
Έκδοση: (2024) -
Rethinking the Evaluation of Compositional Reasoning for Modern VLMs
ανά: Huang, Irene Y.
Έκδοση: (2024) -
SHIC: shape-image correspondences with no keypoint supervision
ανά: Shtedritski, A, κ.ά.
Έκδοση: (2024) -
The Knowledge Gap in Economics: What Does the Public Know about the Economy and What Do Economists Know about the Public?
ανά: Erwin Dekker, κ.ά.
Έκδοση: (2024-12-01) -
What does E_8 know about 11 dimensions ?
ανά: Kogan, I, κ.ά.
Έκδοση: (1999)