Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9797701/ |
_version_ | 1811332597707440128 |
---|---|
author | Jinah Kim Jungchan Cho |
author_facet | Jinah Kim Jungchan Cho |
author_sort | Jinah Kim |
collection | DOAJ |
description | Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14. |
first_indexed | 2024-04-13T16:39:10Z |
format | Article |
id | doaj.art-ba225896e4a74418ac23e5039d33b818 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-13T16:39:10Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ba225896e4a74418ac23e5039d33b8182022-12-22T02:39:19ZengIEEEIEEE Access2169-35362022-01-0110653156532510.1109/ACCESS.2022.31837899797701Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action LocalizationJinah Kim0https://orcid.org/0000-0001-5646-833XJungchan Cho1https://orcid.org/0000-0002-3859-1702College of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaCollege of Information Technology, Gachon University, Sujeong-gu, Seongnam-si, South KoreaWeakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.https://ieeexplore.ieee.org/document/9797701/Temporal action localizationentropy maximizationcontext learningfeature adaptation |
spellingShingle | Jinah Kim Jungchan Cho Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization IEEE Access Temporal action localization entropy maximization context learning feature adaptation |
title | Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization |
title_full | Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization |
title_fullStr | Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization |
title_full_unstemmed | Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization |
title_short | Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization |
title_sort | background aware robust context learning for weakly supervised temporal action localization |
topic | Temporal action localization entropy maximization context learning feature adaptation |
url | https://ieeexplore.ieee.org/document/9797701/ |
work_keys_str_mv | AT jinahkim backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization AT jungchancho backgroundawarerobustcontextlearningforweaklysupervisedtemporalactionlocalization |