Hierarchical interaction network for video object segmentation from referring expressions

In this paper, we investigate the problem of video object segmentation from referring expressions (VOSRE). Conventional methods typically perform multi-modal fusion based on linguistic features and the visual features extracted from the top layer of the visual encoder, which limits these models'...

ver descrição completa

Detalhes bibliográficos
Main Authors: Yang, Z, Tang, Y, Bertinetto, L, Zhao, H, Torr, PHS
Formato: Conference item
Idioma:English
Publicado em: British Machine Vision Association 2021