Pre-training concept frequency is predictive of CLIP zero-shot performance

Web-crawled pre-training datasets are speculated to be key drivers of zero-shot generalization abilities of Vision-Language Models (VLMs) like CLIP, across a range of downstream classification and retrieval tasks, spanning diverse visual concepts. However, it is unclear how meaningful the term “zero...

Full description

Bibliographic Details
Main Authors:	Udandarao, V, Prabhu, A, Torr, PHS, Bibi, A, Albanie, S, Bethge, M
Format:	Conference item
Language:	English
Published:	OpenReview 2024

Pre-training concept frequency is predictive of CLIP zero-shot performance

Similar Items