Data efficient training for egocentric vision-based action recognition
We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We imp...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182402 |
_version_ | 1826119117654982656 |
---|---|
author | Bai, Haolei |
author2 | Alex Chichung Kot |
author_facet | Alex Chichung Kot Bai, Haolei |
author_sort | Bai, Haolei |
collection | NTU |
description | We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks. |
first_indexed | 2025-03-09T12:21:41Z |
format | Thesis-Master by Coursework |
id | ntu-10356/182402 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-03-09T12:21:41Z |
publishDate | 2025 |
publisher | Nanyang Technological University |
record_format | dspace |
spelling | ntu-10356/1824022025-01-31T15:47:45Z Data efficient training for egocentric vision-based action recognition Bai, Haolei Alex Chichung Kot School of Electrical and Electronic Engineering EACKOT@ntu.edu.sg Computer and Information Science Deep learning Egocentric vision Action recognition We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks. Master's degree 2025-01-31T05:29:21Z 2025-01-31T05:29:21Z 2025 Thesis-Master by Coursework Bai, H. (2025). Data efficient training for egocentric vision-based action recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182402 https://hdl.handle.net/10356/182402 en application/pdf Nanyang Technological University |
spellingShingle | Computer and Information Science Deep learning Egocentric vision Action recognition Bai, Haolei Data efficient training for egocentric vision-based action recognition |
title | Data efficient training for egocentric vision-based action recognition |
title_full | Data efficient training for egocentric vision-based action recognition |
title_fullStr | Data efficient training for egocentric vision-based action recognition |
title_full_unstemmed | Data efficient training for egocentric vision-based action recognition |
title_short | Data efficient training for egocentric vision-based action recognition |
title_sort | data efficient training for egocentric vision based action recognition |
topic | Computer and Information Science Deep learning Egocentric vision Action recognition |
url | https://hdl.handle.net/10356/182402 |
work_keys_str_mv | AT baihaolei dataefficienttrainingforegocentricvisionbasedactionrecognition |