YoTube : searching action proposal via recurrent and static regression networks

In this paper, we propose YoTube-a novel deep learning framework for generating action proposals in untrimmed videos, where each action proposal corresponds to a spatial-temporal tube that potentially locates one human action. Most of the existing works generate proposals by clustering low-level fea...

Full description

Bibliographic Details
Main Authors: Zhu, Hongyuan, Vial, Romain, Lu, Shijian, Peng, Xi, Fu, Huazhu, Tian, Yonghong, Cao, Xianbin
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/80911
http://hdl.handle.net/10220/48139
_version_ 1811682080985186304
author Zhu, Hongyuan
Vial, Romain
Lu, Shijian
Peng, Xi
Fu, Huazhu
Tian, Yonghong
Cao, Xianbin
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhu, Hongyuan
Vial, Romain
Lu, Shijian
Peng, Xi
Fu, Huazhu
Tian, Yonghong
Cao, Xianbin
author_sort Zhu, Hongyuan
collection NTU
description In this paper, we propose YoTube-a novel deep learning framework for generating action proposals in untrimmed videos, where each action proposal corresponds to a spatial-temporal tube that potentially locates one human action. Most of the existing works generate proposals by clustering low-level features or linking image proposals, which ignore the interplay between long-term temporal context and short-term cues. Different from these works, our method considers the interplay by designing a new recurrent YoTube detector and static YoTube detector. The recurrent YoTube detector sequentially regresses candidate bounding boxes using Recurrent Neural Network learned long-term temporal contexts. The static YoTube detector produces bounding boxes using rich appearance cues in every single frame. To fully exploit the complementary appearance, motion, and temporal context, we train the recurrent and static detector using RGB (Color) and flow information. Moreover, we fuse the corresponding outputs of the detectors to produce accurate and robust proposal boxes and obtain the final action proposals by linking the proposal boxes using dynamic programming with a novel path trimming method. Benefiting from the pipeline of our method, the untrimmed video could be effectively and efficiently handled. Extensive experiments on the challenging UCF-101, UCF-Sports, and JHMDB datasets show superior performance of the proposed method compared with the state of the arts.
first_indexed 2024-10-01T03:51:10Z
format Journal Article
id ntu-10356/80911
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:51:10Z
publishDate 2019
record_format dspace
spelling ntu-10356/809112020-03-07T11:48:52Z YoTube : searching action proposal via recurrent and static regression networks Zhu, Hongyuan Vial, Romain Lu, Shijian Peng, Xi Fu, Huazhu Tian, Yonghong Cao, Xianbin School of Computer Science and Engineering Object Detection DRNTU::Engineering::Computer science and engineering Image Sequence Analysis In this paper, we propose YoTube-a novel deep learning framework for generating action proposals in untrimmed videos, where each action proposal corresponds to a spatial-temporal tube that potentially locates one human action. Most of the existing works generate proposals by clustering low-level features or linking image proposals, which ignore the interplay between long-term temporal context and short-term cues. Different from these works, our method considers the interplay by designing a new recurrent YoTube detector and static YoTube detector. The recurrent YoTube detector sequentially regresses candidate bounding boxes using Recurrent Neural Network learned long-term temporal contexts. The static YoTube detector produces bounding boxes using rich appearance cues in every single frame. To fully exploit the complementary appearance, motion, and temporal context, we train the recurrent and static detector using RGB (Color) and flow information. Moreover, we fuse the corresponding outputs of the detectors to produce accurate and robust proposal boxes and obtain the final action proposals by linking the proposal boxes using dynamic programming with a novel path trimming method. Benefiting from the pipeline of our method, the untrimmed video could be effectively and efficiently handled. Extensive experiments on the challenging UCF-101, UCF-Sports, and JHMDB datasets show superior performance of the proposed method compared with the state of the arts. Accepted version 2019-05-09T03:36:11Z 2019-12-06T14:17:14Z 2019-05-09T03:36:11Z 2019-12-06T14:17:14Z 2018 Journal Article Zhu, H., Vial, R., Lu, S., Peng, X., Fu, H., Tian, Y., & Cao, X. (2018). YoTube : searching action proposal via recurrent and static regression networks. IEEE Transactions on Image Processing, 27(6), 2609-2622. doi:10.1109/TIP.2018.2806279 1057-7149 https://hdl.handle.net/10356/80911 http://hdl.handle.net/10220/48139 10.1109/TIP.2018.2806279 en IEEE Transactions on Image Processing © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TIP.2018.2806279. 13 p. application/pdf
spellingShingle Object Detection
DRNTU::Engineering::Computer science and engineering
Image Sequence Analysis
Zhu, Hongyuan
Vial, Romain
Lu, Shijian
Peng, Xi
Fu, Huazhu
Tian, Yonghong
Cao, Xianbin
YoTube : searching action proposal via recurrent and static regression networks
title YoTube : searching action proposal via recurrent and static regression networks
title_full YoTube : searching action proposal via recurrent and static regression networks
title_fullStr YoTube : searching action proposal via recurrent and static regression networks
title_full_unstemmed YoTube : searching action proposal via recurrent and static regression networks
title_short YoTube : searching action proposal via recurrent and static regression networks
title_sort yotube searching action proposal via recurrent and static regression networks
topic Object Detection
DRNTU::Engineering::Computer science and engineering
Image Sequence Analysis
url https://hdl.handle.net/10356/80911
http://hdl.handle.net/10220/48139
work_keys_str_mv AT zhuhongyuan yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT vialromain yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT lushijian yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT pengxi yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT fuhuazhu yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT tianyonghong yotubesearchingactionproposalviarecurrentandstaticregressionnetworks
AT caoxianbin yotubesearchingactionproposalviarecurrentandstaticregressionnetworks