Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In...

Full description

Bibliographic Details
Main Authors: Jinah Kim, Jungchan Cho
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9797701/
_version_ 1811332597707440128
author Jinah Kim
Jungchan Cho
author_facet Jinah Kim
Jungchan Cho
author_sort Jinah Kim
collection DOAJ
description Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.
first_indexed 2024-04-13T16:39:10Z
format Article
id doaj.art-ba225896e4a74418ac23e5039d33b818
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T16:39:10Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ba225896e4a74418ac23e5039d33b8182022-12-22T02:39:19ZengIEEEIEEE Access2169-35362022-01-0110653156532510.1109/ACCESS.2022.31837899797701Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action LocalizationJinah Kim0https://orcid.org/0000-0001-5646-833XJungchan Cho1https://orcid.org/0000-0002-3859-1702College of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaCollege of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaWeakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.https://ieeexplore.ieee.org/document/9797701/Temporal action localizationentropy maximizationcontext learningfeature adaptation
spellingShingle Jinah Kim
Jungchan Cho
Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
IEEE Access
Temporal action localization
entropy maximization
context learning
feature adaptation
title Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_full Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_fullStr Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_full_unstemmed Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_short Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
title_sort background aware robust context learning for weakly supervised temporal action localization
topic Temporal action localization
entropy maximization
context learning
feature adaptation
url https://ieeexplore.ieee.org/document/9797701/
work_keys_str_mv AT jinahkim backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization
AT jungchancho backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization