Cross-modal learning from visual information for activity recognition on inertial sensors

<p>The lack of large-scale, labeled datasets impedes progress in developing robust and generalized predictive models for human activity recognition (HAR) from wearable inertial sensor data. Labeled data is scarce as sensor data collection is expensive, and their annotation is time-consuming an...

Full description

Bibliographic Details
Main Author: Tong, EGC
Other Authors: Lane, ND
Format: Thesis
Language:English
Published: 2023
Subjects:
_version_ 1797113319119650816
author Tong, EGC
author2 Lane, ND
author_facet Lane, ND
Tong, EGC
author_sort Tong, EGC
collection OXFORD
description <p>The lack of large-scale, labeled datasets impedes progress in developing robust and generalized predictive models for human activity recognition (HAR) from wearable inertial sensor data. Labeled data is scarce as sensor data collection is expensive, and their annotation is time-consuming and error-prone. As a result, public inertial HAR datasets are small in terms of number of subjects, activity classes, hours of recorded data, and variation in recorded environments. Machine learning models, developed using these small datasets, are effectively blind to the diverse expressions of activities performed by wide-ranging populations in the real world, and progress in wearable inertial sensing is held back by this bottleneck for activity understanding. .</p> <p>But just as Internet-scale text, image and audio data have pushed their respective pattern recognition fields to systems reliable enough for everyday use, easy access to large quantities of data can push forward the field of inertial HAR, and by extension wearable sensing. To this end, this thesis pioneers the idea of exploiting the visual modality as a source domain for cross-modal learning, such that data and knowledge can be transferred across to benefit the target domain of inertial HAR. .</p> <p>This thesis makes three contributions to inertial HAR through cross-modal approaches. First, to overcome the barrier of expensive inertial data collection and annotation, we contribute a novel pipeline that automatically extracts virtual accelerometer data from videos of human activities, which are readily annotated and accessible in large quantities. Second, we propose acquiring transferable representations about activities, from HAR models trained using large quantities of visual data to enrich the development of inertial HAR models. Finally, the third contribution exposes HAR models to the challenging setting of zero-shot learning; we propose mechanisms that leverage cross-modal correspondence to enable inference on previously unseen classes. .</p> <p>Unlike prior approaches, this body of work pushes forward the state of the art in HAR not by exhausting resources concentrated in the inertial domain, but by exploiting an existing, resourceful, intuitive, and informative source, the visual domain. These contributions represent a new line of cross-modal thinking in inertial HAR, and suggest important future directions for inertial-based wearable sensing research.</p>
first_indexed 2024-04-23T08:26:56Z
format Thesis
id oxford-uuid:4ebc187e-bbe3-40a6-ae4c-b21337d48e23
institution University of Oxford
language English
last_indexed 2024-04-23T08:26:56Z
publishDate 2023
record_format dspace
spelling oxford-uuid:4ebc187e-bbe3-40a6-ae4c-b21337d48e232024-04-22T12:23:03ZCross-modal learning from visual information for activity recognition on inertial sensorsThesishttp://purl.org/coar/resource_type/c_db06uuid:4ebc187e-bbe3-40a6-ae4c-b21337d48e23Transfer learning (Machine learning)Deep learning (Machine learning)Wearable technologyEnglishHyrax Deposit2023Tong, EGCLane, ND<p>The lack of large-scale, labeled datasets impedes progress in developing robust and generalized predictive models for human activity recognition (HAR) from wearable inertial sensor data. Labeled data is scarce as sensor data collection is expensive, and their annotation is time-consuming and error-prone. As a result, public inertial HAR datasets are small in terms of number of subjects, activity classes, hours of recorded data, and variation in recorded environments. Machine learning models, developed using these small datasets, are effectively blind to the diverse expressions of activities performed by wide-ranging populations in the real world, and progress in wearable inertial sensing is held back by this bottleneck for activity understanding. .</p> <p>But just as Internet-scale text, image and audio data have pushed their respective pattern recognition fields to systems reliable enough for everyday use, easy access to large quantities of data can push forward the field of inertial HAR, and by extension wearable sensing. To this end, this thesis pioneers the idea of exploiting the visual modality as a source domain for cross-modal learning, such that data and knowledge can be transferred across to benefit the target domain of inertial HAR. .</p> <p>This thesis makes three contributions to inertial HAR through cross-modal approaches. First, to overcome the barrier of expensive inertial data collection and annotation, we contribute a novel pipeline that automatically extracts virtual accelerometer data from videos of human activities, which are readily annotated and accessible in large quantities. Second, we propose acquiring transferable representations about activities, from HAR models trained using large quantities of visual data to enrich the development of inertial HAR models. Finally, the third contribution exposes HAR models to the challenging setting of zero-shot learning; we propose mechanisms that leverage cross-modal correspondence to enable inference on previously unseen classes. .</p> <p>Unlike prior approaches, this body of work pushes forward the state of the art in HAR not by exhausting resources concentrated in the inertial domain, but by exploiting an existing, resourceful, intuitive, and informative source, the visual domain. These contributions represent a new line of cross-modal thinking in inertial HAR, and suggest important future directions for inertial-based wearable sensing research.</p>
spellingShingle Transfer learning (Machine learning)
Deep learning (Machine learning)
Wearable technology
Tong, EGC
Cross-modal learning from visual information for activity recognition on inertial sensors
title Cross-modal learning from visual information for activity recognition on inertial sensors
title_full Cross-modal learning from visual information for activity recognition on inertial sensors
title_fullStr Cross-modal learning from visual information for activity recognition on inertial sensors
title_full_unstemmed Cross-modal learning from visual information for activity recognition on inertial sensors
title_short Cross-modal learning from visual information for activity recognition on inertial sensors
title_sort cross modal learning from visual information for activity recognition on inertial sensors
topic Transfer learning (Machine learning)
Deep learning (Machine learning)
Wearable technology
work_keys_str_mv AT tongegc crossmodallearningfromvisualinformationforactivityrecognitiononinertialsensors