HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

© 2019 IEEE. This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus and disagreement among visual classifiers to automatically mine candidate s...

Full description

Bibliographic Details
Main Authors:	Zhao, Hang, Torralba, Antonio, Torresani, Lorenzo, Yan, Zhicheng
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	IEEE 2021
Online Access:	https://hdl.handle.net/1721.1/137602

_version_	1811091780907565056
author	Zhao, Hang Torralba, Antonio Torresani, Lorenzo Yan, Zhicheng
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhao, Hang Torralba, Antonio Torresani, Lorenzo Yan, Zhicheng
author_sort	Zhao, Hang
collection	MIT
description	© 2019 IEEE. This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large-scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transfer learning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1M as a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations.
first_indexed	2024-09-23T15:07:54Z
format	Article
id	mit-1721.1/137602
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T15:07:54Z
publishDate	2021
publisher	IEEE
record_format	dspace
spelling	mit-1721.1/1376022023-02-10T21:25:05Z HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization Zhao, Hang Torralba, Antonio Torresani, Lorenzo Yan, Zhicheng Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2019 IEEE. This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large-scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transfer learning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1M as a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations. 2021-11-05T19:39:46Z 2021-11-05T19:39:46Z 2019 2021-01-28T13:00:59Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137602 Zhao, Hang, Torralba, Antonio, Torresani, Lorenzo and Yan, Zhicheng. 2019. "HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization." Proceedings of the IEEE International Conference on Computer Vision, 2019-October. en 10.1109/ICCV.2019.00876 Proceedings of the IEEE International Conference on Computer Vision Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf IEEE arXiv
spellingShingle	Zhao, Hang Torralba, Antonio Torresani, Lorenzo Yan, Zhicheng HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title_full	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title_fullStr	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title_full_unstemmed	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title_short	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
title_sort	hacs human action clips and segments dataset for recognition and temporal localization
url	https://hdl.handle.net/1721.1/137602
work_keys_str_mv	AT zhaohang hacshumanactionclipsandsegmentsdatasetforrecognitionandtemporallocalization AT torralbaantonio hacshumanactionclipsandsegmentsdatasetforrecognitionandtemporallocalization AT torresanilorenzo hacshumanactionclipsandsegmentsdatasetforrecognitionandtemporallocalization AT yanzhicheng hacshumanactionclipsandsegmentsdatasetforrecognitionandtemporallocalization

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

Similar Items