Data efficient training for egocentric vision-based action recognition

We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We imp...

Full description

Bibliographic Details
Main Author: Bai, Haolei
Other Authors: Alex Chichung Kot
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182402
Description
Summary:We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks.