Verbs in action: improving verb understanding in video-language models

Understanding verbs is crucial to modelling how people and objects interact with each other and the environment through space and time. Recently, state-of-the-art video-language models based on CLIP have been shown to have limited verb understanding and to rely extensively on nouns, restricting thei...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид:	Momeni, L, Caron, M, Nagrani, A, Zisserman, A, Schmid, C
Формат:	Conference item
Хэл сонгох:	English
Хэвлэсэн:	IEEE 2024

Verbs in action: improving verb understanding in video-language models

Ижил төстэй зүйлс