Face, body, voice: video person-clustering with multiple modalities

Face, body, voice: video person-clustering with multiple modalities

The objective of this work is person-clustering in videos – grouping characters according to their identity. Previous methods focus on the narrower task of face-clustering, and for the most part ignore other cues such as the person’s voice, their overall appearance (hair, clothes, posture), and the...

Full description

Bibliographic Details
Main Authors:	Brown, A, Kalogeiton, V, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2021

Similar Items

Constrained video face clustering using 1NN relations
by: Kalogeiton, V, et al.
Published: (2020)

Seeing voices and hearing faces: Cross-modal biometric matching
by: Nagrani, A, et al.
Published: (2018)

LAEO-Net++: revisiting people looking at each other in videos
by: Marin-Jimenez, MJ, et al.
Published: (2020)

LAEO-Net: Revisiting people looking at each other in videos
by: Marin-Jimenez, M, et al.
Published: (2020)

Modality-specific brain representations during automatic processing of face, voice and body expressions
by: Maarten Vaessen, et al.
Published: (2023-10-01)

Automated video labelling: identifying faces by corroborative evidence
by: Brown, A, et al.
Published: (2021)

Smooth-AP: Smoothing the path towards large-scale image retrieval
by: Brown, A, et al.
Published: (2020)

Person Perception from Face and Voice
by: Serge Brédart
Published: (2014-05-01)

Correlated expression of the body, face, and voice during character portrayal in actors
by: Matthew Berry, et al.
Published: (2022-05-01)

Learnable PINs: Cross-modal embeddings for person identity
by: Nagrani, A, et al.
Published: (2018)

Which Mabuse? Multiple Bodies, Multiple Voices
by: Massimiliano Gaudiosi
Published: (2005-10-01)

Automated video face labelling for films and TV material
by: Parkhi, OM, et al.
Published: (2018)

Personalizing human video pose estimation
by: Charles, J, et al.
Published: (2016)

Self-supervised multi-modal alignment for whole body medical imaging
by: Windsor, R, et al.
Published: (2021)

TEACHTEXT: CrossModal generalized distillation for text-video retrieval
by: Croitoru, I, et al.
Published: (2022)

Chimpanzee face recognition from videos in the wild using deep learning
by: Schofield, D, et al.
Published: (2019)

NaijaFaceVoice: A Large-Scale Deep Learning Model and Database of Nigerian Faces and Voices
by: Adekunle Anthony Akinrinmade, et al.
Published: (2023-01-01)

Data, voice and video cabling /
by: Hayes, Jim, 1946-, et al.
Published: (2000)

Voice and video conferencing fundamentals /
by: 258323 Firestone, Scott, et al.
Published: (2007)

Data, voice, and video cabling /
by: Hayes, Jim, 1946-, et al.
Published: (2005)

On learning associations of faces and voices
by: Kim, Changil, et al.
Published: (2020)

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification
by: Abderrazzaq Moufidi, et al.
Published: (2023-06-01)

VPTD: Human Face Video Dataset for Personality Traits Detection
by: Kenan Kassab, et al.
Published: (2023-06-01)

Probabilistic framework for multi-modal multiple person tracking
by: Checka, Neal, 1977-
Published: (2014)

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
by: Madina Abdrakhmanova, et al.
Published: (2021-05-01)

Face Biometrics for Personal Identification [electronic book] : Multi-Sensory Multi-Modal Systems /
by: Hammoud, Riad I., et al.
Published: (2007)

Her Body, Their Voice
by: Amanda Pedersen
Published: (2023-12-01)

Speech2Face: Learning the Face Behind a Voice
by: Oh, Taehyun, et al.
Published: (2021)

Flexible bronchoscopy with multiple modalities for foreign body removal in adults.
by: Yueh-Fu Fang, et al.
Published: (2015-01-01)

Differences of Modality Use between Telepractice and Face-to-Face Administration of the Scenario-Test in Persons with Dementia-Related Speech Disorder
by: Mirjam Gauch, et al.
Published: (2023-01-01)

Mutual modality learning for video action classification
by: S.A. Komkov, et al.
Published: (2023-08-01)

Data, voice, and video cabling laboratory manual /
by: Hayes, Jim, 1946-, et al.
Published: (2005)

Telecommunications primer : data, voice and video communications /
by: 201358 Carne, E. Bryan
Published: (1999)

Effects of Voice and Biographic Data on Face Encoding
by: Thilda Karlsson, et al.
Published: (2023-01-01)

Effects of Faces and Voices on the Encoding of Biographic Information
by: Sarah Fransson, et al.
Published: (2022-12-01)

Can video consultations replace face to face interviews?
by: Sutherland, A, et al.
Published: (2020)

Goal-directed video metrology
by: Reid, I, et al.
Published: (2005)

Multicolumn networks for face recognition
by: Xie, W, et al.
Published: (2018)

Video face replacement
by: Dale, Kevin, et al.
Published: (2012)

Deep face recognition
by: Parkhi, O, et al.
Published: (2015)