Hierarchical interaction network for video object segmentation from referring expressions
In this paper, we investigate the problem of video object segmentation from referring expressions (VOSRE). Conventional methods typically perform multi-modal fusion based on linguistic features and the visual features extracted from the top layer of the visual encoder, which limits these models'...
Main Authors: | , , , , |
---|---|
Formato: | Conference item |
Idioma: | English |
Publicado em: |
British Machine Vision Association
2021
|