Data efficient training for egocentric vision-based action recognition

We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We imp...

Full description

Bibliographic Details
Main Author: Bai, Haolei
Other Authors: Alex Chichung Kot
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182402
_version_ 1826119117654982656
author Bai, Haolei
author2 Alex Chichung Kot
author_facet Alex Chichung Kot
Bai, Haolei
author_sort Bai, Haolei
collection NTU
description We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks.
first_indexed 2025-03-09T12:21:41Z
format Thesis-Master by Coursework
id ntu-10356/182402
institution Nanyang Technological University
language English
last_indexed 2025-03-09T12:21:41Z
publishDate 2025
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1824022025-01-31T15:47:45Z Data efficient training for egocentric vision-based action recognition Bai, Haolei Alex Chichung Kot School of Electrical and Electronic Engineering EACKOT@ntu.edu.sg Computer and Information Science Deep learning Egocentric vision Action recognition We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks. Master's degree 2025-01-31T05:29:21Z 2025-01-31T05:29:21Z 2025 Thesis-Master by Coursework Bai, H. (2025). Data efficient training for egocentric vision-based action recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182402 https://hdl.handle.net/10356/182402 en application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Deep learning
Egocentric vision
Action recognition
Bai, Haolei
Data efficient training for egocentric vision-based action recognition
title Data efficient training for egocentric vision-based action recognition
title_full Data efficient training for egocentric vision-based action recognition
title_fullStr Data efficient training for egocentric vision-based action recognition
title_full_unstemmed Data efficient training for egocentric vision-based action recognition
title_short Data efficient training for egocentric vision-based action recognition
title_sort data efficient training for egocentric vision based action recognition
topic Computer and Information Science
Deep learning
Egocentric vision
Action recognition
url https://hdl.handle.net/10356/182402
work_keys_str_mv AT baihaolei dataefficienttrainingforegocentricvisionbasedactionrecognition