Multimodal deep learning for activity and context recognition

Wearables and mobile devices see the world through the lens of half a dozen low-power sensors, such as, barometers, accelerometers, microphones and proximity detectors. But differences between sensors ranging from sampling rates, discrete and continuous data or even the data type itself make princip...

Täydet tiedot

Bibliografiset tiedot
Päätekijät:	Radu, V, Tong, C, Bhattacharya, S, Lane, N, Mascolo, C, Marina, M, Kawsar, F
Aineistotyyppi:	Journal article
Julkaistu:	Association for Computing Machinery 2018

_version_	1826283180710166528
author	Radu, V Tong, C Bhattacharya, S Lane, N Mascolo, C Marina, M Kawsar, F
author_facet	Radu, V Tong, C Bhattacharya, S Lane, N Mascolo, C Marina, M Kawsar, F
author_sort	Radu, V
collection	OXFORD
description	Wearables and mobile devices see the world through the lens of half a dozen low-power sensors, such as, barometers, accelerometers, microphones and proximity detectors. But differences between sensors ranging from sampling rates, discrete and continuous data or even the data type itself make principled approaches to integrating these streams challenging. How, for example, is barometric pressure best combined with an audio sample to infer if a user is in a car, plane or bike? Critically for applications, how successfully sensor devices are able to maximize the information contained across these multi-modal sensor streams often dictates the fidelity at which they can track user behaviors and context changes. This paper studies the benefits of adopting deep learning algorithms for interpreting user activity and context as captured by multi-sensor systems. Specifically, we focus on four variations of deep neural networks that are based either on fully-connected Deep Neural Networks (DNNs) or Convolutional Neural Networks (CNNs). Two of these architectures follow conventional deep models by performing feature representation learning from a concatenation of sensor types. This classic approach is contrasted with a promising deep model variant characterized by modality-specific partitions of the architecture to maximize intra-modality learning. Our exploration represents the first time these architectures have been evaluated for multimodal deep learning under wearable data -- and for convolutional layers within this architecture, it represents a novel architecture entirely. Experiments show these generic multimodal neural network models compete well with a rich variety of conventional hand-designed shallow methods (including feature extraction and classifier construction) and task-specific modeling pipelines, across a wide-range of sensor types and inference tasks (four different datasets). Although the training and inference overhead of these multimodal deep approaches is in some cases appreciable, we also demonstrate the feasibility of on-device mobile and wearable execution is not a barrier to adoption. This study is carefully constructed to focus on multimodal aspects of wearable data modeling for deep learning by providing a wide range of empirical observations, which we expect to have considerable value in the community. We summarize our observations into a series of practitioner rules-of-thumb and lessons learned that can guide the usage of multimodal deep learning for activity and context detection.
first_indexed	2024-03-07T00:55:02Z
format	Journal article
id	oxford-uuid:87c29798-2731-48df-9a4b-e1b1aa9caf1c
institution	University of Oxford
last_indexed	2024-03-07T00:55:02Z
publishDate	2018
publisher	Association for Computing Machinery
record_format	dspace
spelling	oxford-uuid:87c29798-2731-48df-9a4b-e1b1aa9caf1c2022-03-26T22:12:41ZMultimodal deep learning for activity and context recognitionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:87c29798-2731-48df-9a4b-e1b1aa9caf1cSymplectic Elements at OxfordAssociation for Computing Machinery2018Radu, VTong, CBhattacharya, SLane, NMascolo, CMarina, MKawsar, FWearables and mobile devices see the world through the lens of half a dozen low-power sensors, such as, barometers, accelerometers, microphones and proximity detectors. But differences between sensors ranging from sampling rates, discrete and continuous data or even the data type itself make principled approaches to integrating these streams challenging. How, for example, is barometric pressure best combined with an audio sample to infer if a user is in a car, plane or bike? Critically for applications, how successfully sensor devices are able to maximize the information contained across these multi-modal sensor streams often dictates the fidelity at which they can track user behaviors and context changes. This paper studies the benefits of adopting deep learning algorithms for interpreting user activity and context as captured by multi-sensor systems. Specifically, we focus on four variations of deep neural networks that are based either on fully-connected Deep Neural Networks (DNNs) or Convolutional Neural Networks (CNNs). Two of these architectures follow conventional deep models by performing feature representation learning from a concatenation of sensor types. This classic approach is contrasted with a promising deep model variant characterized by modality-specific partitions of the architecture to maximize intra-modality learning. Our exploration represents the first time these architectures have been evaluated for multimodal deep learning under wearable data -- and for convolutional layers within this architecture, it represents a novel architecture entirely. Experiments show these generic multimodal neural network models compete well with a rich variety of conventional hand-designed shallow methods (including feature extraction and classifier construction) and task-specific modeling pipelines, across a wide-range of sensor types and inference tasks (four different datasets). Although the training and inference overhead of these multimodal deep approaches is in some cases appreciable, we also demonstrate the feasibility of on-device mobile and wearable execution is not a barrier to adoption. This study is carefully constructed to focus on multimodal aspects of wearable data modeling for deep learning by providing a wide range of empirical observations, which we expect to have considerable value in the community. We summarize our observations into a series of practitioner rules-of-thumb and lessons learned that can guide the usage of multimodal deep learning for activity and context detection.
spellingShingle	Radu, V Tong, C Bhattacharya, S Lane, N Mascolo, C Marina, M Kawsar, F Multimodal deep learning for activity and context recognition
title	Multimodal deep learning for activity and context recognition
title_full	Multimodal deep learning for activity and context recognition
title_fullStr	Multimodal deep learning for activity and context recognition
title_full_unstemmed	Multimodal deep learning for activity and context recognition
title_short	Multimodal deep learning for activity and context recognition
title_sort	multimodal deep learning for activity and context recognition
work_keys_str_mv	AT raduv multimodaldeeplearningforactivityandcontextrecognition AT tongc multimodaldeeplearningforactivityandcontextrecognition AT bhattacharyas multimodaldeeplearningforactivityandcontextrecognition AT lanen multimodaldeeplearningforactivityandcontextrecognition AT mascoloc multimodaldeeplearningforactivityandcontextrecognition AT marinam multimodaldeeplearningforactivityandcontextrecognition AT kawsarf multimodaldeeplearningforactivityandcontextrecognition

Multimodal deep learning for activity and context recognition

Samankaltaisia teoksia