Verbs in action: improving verb understanding in video-language models
Understanding verbs is crucial to modelling how people and objects interact with each other and the environment through space and time. Recently, state-of-the-art video-language models based on CLIP have been shown to have limited verb understanding and to rely extensively on nouns, restricting thei...
Үндсэн зохиолчид: | , , , , |
---|---|
Формат: | Conference item |
Хэл сонгох: | English |
Хэвлэсэн: |
IEEE
2024
|