Multimodal human behavior analysis: Learning correlation and interaction across modalities

Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCR...

Full description

Bibliographic Details
Main Authors: Song, Yale, Morency, Louis-Philippe, Davis, Randall
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: 2014
Online Access:http://hdl.handle.net/1721.1/86099
https://orcid.org/0000-0001-5232-7281
_version_ 1811079139174645760
author Song, Yale
Morency, Louis-Philippe
Davis, Randall
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Song, Yale
Morency, Louis-Philippe
Davis, Randall
author_sort Song, Yale
collection MIT
description Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities.
first_indexed 2024-09-23T11:10:36Z
format Article
id mit-1721.1/86099
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T11:10:36Z
publishDate 2014
record_format dspace
spelling mit-1721.1/860992022-09-27T17:37:30Z Multimodal human behavior analysis: Learning correlation and interaction across modalities Song, Yale Morency, Louis-Philippe Davis, Randall Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Song, Yale Davis, Randall Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities. United States. Office of Naval Research (Grant N000140910625) National Science Foundation (U.S.) (Grant IIS-1118018) National Science Foundation (U.S.) (Grant IIS-1018055) United States. Army Research, Development, and Engineering Command 2014-04-11T14:20:52Z 2014-04-11T14:20:52Z 2012-10 Article http://purl.org/eprint/type/ConferencePaper 9781450314671 http://hdl.handle.net/1721.1/86099 Yale Song, Louis-Philippe Morency, and Randall Davis. 2012. Multimodal human behavior analysis: learning correlation and interaction across modalities. In Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12). ACM, New York, NY, USA, 27-30. https://orcid.org/0000-0001-5232-7281 en_US http://dx.doi.org/10.1145/2388676.2388684 Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf MIT web domain
spellingShingle Song, Yale
Morency, Louis-Philippe
Davis, Randall
Multimodal human behavior analysis: Learning correlation and interaction across modalities
title Multimodal human behavior analysis: Learning correlation and interaction across modalities
title_full Multimodal human behavior analysis: Learning correlation and interaction across modalities
title_fullStr Multimodal human behavior analysis: Learning correlation and interaction across modalities
title_full_unstemmed Multimodal human behavior analysis: Learning correlation and interaction across modalities
title_short Multimodal human behavior analysis: Learning correlation and interaction across modalities
title_sort multimodal human behavior analysis learning correlation and interaction across modalities
url http://hdl.handle.net/1721.1/86099
https://orcid.org/0000-0001-5232-7281
work_keys_str_mv AT songyale multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities
AT morencylouisphilippe multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities
AT davisrandall multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities