HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Signals |
Subjects: | |
Online Access: | https://www.mdpi.com/2624-6120/2/3/37 |
_version_ | 1797517193727967232 |
---|---|
author | Paritosh Parmar Brendan Morris |
author_facet | Paritosh Parmar Brendan Morris |
author_sort | Paritosh Parmar |
collection | DOAJ |
description | Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase. |
first_indexed | 2024-03-10T07:12:24Z |
format | Article |
id | doaj.art-436fda6cd3c148609f228626c952a256 |
institution | Directory Open Access Journal |
issn | 2624-6120 |
language | English |
last_indexed | 2024-03-10T07:12:24Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Signals |
spelling | doaj.art-436fda6cd3c148609f228626c952a2562023-11-22T15:15:59ZengMDPI AGSignals2624-61202021-09-012360461810.3390/signals2030037HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNNParitosh Parmar0Brendan Morris1Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, CanadaDepartment of Electrical & Computer Engineering, University of Nevada, Las Vegas, NV 89119, USASpatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.https://www.mdpi.com/2624-6120/2/3/37action recognitionscene recognitionaction quality assessmentactivity recognitiondeep learningcomputer vision |
spellingShingle | Paritosh Parmar Brendan Morris HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN Signals action recognition scene recognition action quality assessment activity recognition deep learning computer vision |
title | HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN |
title_full | HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN |
title_fullStr | HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN |
title_full_unstemmed | HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN |
title_short | HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN |
title_sort | hallucinet i ing i spatiotemporal representations using a 2d cnn |
topic | action recognition scene recognition action quality assessment activity recognition deep learning computer vision |
url | https://www.mdpi.com/2624-6120/2/3/37 |
work_keys_str_mv | AT paritoshparmar hallucinetiingispatiotemporalrepresentationsusinga2dcnn AT brendanmorris hallucinetiingispatiotemporalrepresentationsusinga2dcnn |