Summary: | We investigate the application of semi-supervised learning in egocentric action anticipation to tackle the issue of limited labeled data. Leveraging both fully labeled and pseudo-labeled data for training can effectively improve model performance, especially when fully labeled data is scarce. We implement this strategy using two advanced transformer-based models, the Memory-and-Anticipation Transformer (MAT) and the Anticipative Feature Fusion Transformer (AFFT), both of which are tailored for capturing intricate temporal dependencies within egocentric video data. Experimental evaluations on the Epic-Kitchens-100 and EGTEA Gaze+ datasets reveal that the semi-supervised approach yields notable improvements in action anticipation accuracy compared to models trained exclusively on limited labeled data. Importantly, performance gains are most significant under highly constrained data settings, emphasizing the practicality of semi-supervised learning in scenarios where labeled data is limited or costly to obtain. This study highlights the promise of integrating semi-supervised learning with specialized models to advance action anticipation capabilities in egocentric video tasks.
|