Face, body, voice: video person-clustering with multiple modalities
The objective of this work is person-clustering in videos – grouping characters according to their identity. Previous methods focus on the narrower task of face-clustering, and for the most part ignore other cues such as the person’s voice, their overall appearance (hair, clothes, posture), and the...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2021
|