Multimodal human behavior analysis: Learning correlation and interaction across modalities
Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCR...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
2014
|
Online Access: | http://hdl.handle.net/1721.1/86099 https://orcid.org/0000-0001-5232-7281 |
_version_ | 1811079139174645760 |
---|---|
author | Song, Yale Morency, Louis-Philippe Davis, Randall |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Song, Yale Morency, Louis-Philippe Davis, Randall |
author_sort | Song, Yale |
collection | MIT |
description | Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities. |
first_indexed | 2024-09-23T11:10:36Z |
format | Article |
id | mit-1721.1/86099 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T11:10:36Z |
publishDate | 2014 |
record_format | dspace |
spelling | mit-1721.1/860992022-09-27T17:37:30Z Multimodal human behavior analysis: Learning correlation and interaction across modalities Song, Yale Morency, Louis-Philippe Davis, Randall Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Song, Yale Davis, Randall Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities. United States. Office of Naval Research (Grant N000140910625) National Science Foundation (U.S.) (Grant IIS-1118018) National Science Foundation (U.S.) (Grant IIS-1018055) United States. Army Research, Development, and Engineering Command 2014-04-11T14:20:52Z 2014-04-11T14:20:52Z 2012-10 Article http://purl.org/eprint/type/ConferencePaper 9781450314671 http://hdl.handle.net/1721.1/86099 Yale Song, Louis-Philippe Morency, and Randall Davis. 2012. Multimodal human behavior analysis: learning correlation and interaction across modalities. In Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12). ACM, New York, NY, USA, 27-30. https://orcid.org/0000-0001-5232-7281 en_US http://dx.doi.org/10.1145/2388676.2388684 Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf MIT web domain |
spellingShingle | Song, Yale Morency, Louis-Philippe Davis, Randall Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title | Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title_full | Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title_fullStr | Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title_full_unstemmed | Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title_short | Multimodal human behavior analysis: Learning correlation and interaction across modalities |
title_sort | multimodal human behavior analysis learning correlation and interaction across modalities |
url | http://hdl.handle.net/1721.1/86099 https://orcid.org/0000-0001-5232-7281 |
work_keys_str_mv | AT songyale multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities AT morencylouisphilippe multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities AT davisrandall multimodalhumanbehavioranalysislearningcorrelationandinteractionacrossmodalities |