Self-supervised video representation learning by uncovering spatio-temporal statistics
This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spa...
Main Authors: | , , , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
IEEE
2021
|