Helping hands: an object-aware ego-centric video recognition model
We introduce an object-aware decoder for improving the performance of spatio-temporal representations on egocentric videos. The key idea is to enhance object-awareness during training by tasking the model to predict hand positions, object positions, and the semantic label of the objects using paired...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|