Data efficient training for egocentric vision-based action recognition

We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We imp...

Full description

Bibliographic Details
Main Author:	Bai, Haolei
Other Authors:	Alex Chichung Kot
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2025
Subjects:	Computer and Information Science Deep learning Egocentric vision Action recognition
Online Access:	https://hdl.handle.net/10356/182402

_version_	1826119117654982656
author	Bai, Haolei
author2	Alex Chichung Kot
author_facet	Alex Chichung Kot Bai, Haolei
author_sort	Bai, Haolei
collection	NTU
description	We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks.
first_indexed	2025-03-09T12:21:41Z
format	Thesis-Master by Coursework
id	ntu-10356/182402
institution	Nanyang Technological University
language	English
last_indexed	2025-03-09T12:21:41Z
publishDate	2025
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1824022025-01-31T15:47:45Z Data efficient training for egocentric vision-based action recognition Bai, Haolei Alex Chichung Kot School of Electrical and Electronic Engineering EACKOT@ntu.edu.sg Computer and Information Science Deep learning Egocentric vision Action recognition We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks. Master's degree 2025-01-31T05:29:21Z 2025-01-31T05:29:21Z 2025 Thesis-Master by Coursework Bai, H. (2025). Data efficient training for egocentric vision-based action recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182402 https://hdl.handle.net/10356/182402 en application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Deep learning Egocentric vision Action recognition Bai, Haolei Data efficient training for egocentric vision-based action recognition
title	Data efficient training for egocentric vision-based action recognition
title_full	Data efficient training for egocentric vision-based action recognition
title_fullStr	Data efficient training for egocentric vision-based action recognition
title_full_unstemmed	Data efficient training for egocentric vision-based action recognition
title_short	Data efficient training for egocentric vision-based action recognition
title_sort	data efficient training for egocentric vision based action recognition
topic	Computer and Information Science Deep learning Egocentric vision Action recognition
url	https://hdl.handle.net/10356/182402
work_keys_str_mv	AT baihaolei dataefficienttrainingforegocentricvisionbasedactionrecognition

Data efficient training for egocentric vision-based action recognition

Similar Items