Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In...

Full description

Bibliographic Details
Main Authors:	Jinah Kim, Jungchan Cho
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Temporal action localization entropy maximization context learning feature adaptation
Online Access:	https://ieeexplore.ieee.org/document/9797701/

_version_	1828314055158267904
author	Jinah Kim Jungchan Cho
author_facet	Jinah Kim Jungchan Cho
author_sort	Jinah Kim
collection	DOAJ
description	Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.
first_indexed	2024-04-13T16:39:10Z
format	Article
id	doaj.art-ba225896e4a74418ac23e5039d33b818
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-13T16:39:10Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ba225896e4a74418ac23e5039d33b8182022-12-22T02:39:19ZengIEEEIEEE Access2169-35362022-01-0110653156532510.1109/ACCESS.2022.31837899797701Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action LocalizationJinah Kim0https://orcid.org/0000-0001-5646-833XJungchan Cho1https://orcid.org/0000-0002-3859-1702College of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaCollege of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaWeakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.https://ieeexplore.ieee.org/document/9797701/Temporal action localizationentropy maximizationcontext learningfeature adaptation
spellingShingle	Jinah Kim Jungchan Cho Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization IEEE Access Temporal action localization entropy maximization context learning feature adaptation
title	Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_full	Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_fullStr	Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_full_unstemmed	Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_short	Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_sort	background aware robust context learning for weakly supervised temporal action localization
topic	Temporal action localization entropy maximization context learning feature adaptation
url	https://ieeexplore.ieee.org/document/9797701/
work_keys_str_mv	AT jinahkim backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization AT jungchancho backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization

Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

Similar Items