Structured learning of human interactions in TV shows

The objective of this work is recognition and spatiotemporal localization of two-person interactions in video. Our approach is person-centric. As a first stage we track all upper bodies and heads in a video using a tracking-by-detection approach that combines detections with KLT tracking and clique...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar:	Patron-Perez, A, Marszalek, M, Reid, I, Zisserman, A
Materyal Türü:	Journal article
Dil:	English
Baskı/Yayın Bilgisi:	IEEE 2012

_version_	1826316879480750080
author	Patron-Perez, A Marszalek, M Reid, I Zisserman, A
author_facet	Patron-Perez, A Marszalek, M Reid, I Zisserman, A
author_sort	Patron-Perez, A
collection	OXFORD
description	The objective of this work is recognition and spatiotemporal localization of two-person interactions in video. Our approach is person-centric. As a first stage we track all upper bodies and heads in a video using a tracking-by-detection approach that combines detections with KLT tracking and clique partitioning, together with occlusion detection, to yield robust person tracks. We develop local descriptors of activity based on the head orientation (estimated using a set of pose-specific classifiers) and the local spatiotemporal region around them, together with global descriptors that encode the relative positions of people as a function of interaction type. Learning and inference on the model uses a structured output SVM which combines the local and global descriptors in a principled manner. Inference using the model yields information about which pairs of people are interacting, their interaction class, and their head orientation (which is also treated as a variable, enabling mistakes in the classifier to be corrected using global context). We show that inference can be carried out with polynomial complexity in the number of people, and describe an efficient algorithm for this. The method is evaluated on a new dataset comprising 300 video clips acquired from 23 different TV shows and on the benchmark UT--Interaction dataset.
first_indexed	2024-03-06T22:50:22Z
format	Journal article
id	oxford-uuid:5e9048ae-61b2-41ef-bdcf-0a34f0eba8cf
institution	University of Oxford
language	English
last_indexed	2025-02-19T04:29:49Z
publishDate	2012
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:5e9048ae-61b2-41ef-bdcf-0a34f0eba8cf2024-12-18T17:53:53ZStructured learning of human interactions in TV showsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:5e9048ae-61b2-41ef-bdcf-0a34f0eba8cfEnglishSymplectic Elements at OxfordIEEE2012Patron-Perez, AMarszalek, MReid, IZisserman, AThe objective of this work is recognition and spatiotemporal localization of two-person interactions in video. Our approach is person-centric. As a first stage we track all upper bodies and heads in a video using a tracking-by-detection approach that combines detections with KLT tracking and clique partitioning, together with occlusion detection, to yield robust person tracks. We develop local descriptors of activity based on the head orientation (estimated using a set of pose-specific classifiers) and the local spatiotemporal region around them, together with global descriptors that encode the relative positions of people as a function of interaction type. Learning and inference on the model uses a structured output SVM which combines the local and global descriptors in a principled manner. Inference using the model yields information about which pairs of people are interacting, their interaction class, and their head orientation (which is also treated as a variable, enabling mistakes in the classifier to be corrected using global context). We show that inference can be carried out with polynomial complexity in the number of people, and describe an efficient algorithm for this. The method is evaluated on a new dataset comprising 300 video clips acquired from 23 different TV shows and on the benchmark UT--Interaction dataset.
spellingShingle	Patron-Perez, A Marszalek, M Reid, I Zisserman, A Structured learning of human interactions in TV shows
title	Structured learning of human interactions in TV shows
title_full	Structured learning of human interactions in TV shows
title_fullStr	Structured learning of human interactions in TV shows
title_full_unstemmed	Structured learning of human interactions in TV shows
title_short	Structured learning of human interactions in TV shows
title_sort	structured learning of human interactions in tv shows
work_keys_str_mv	AT patronpereza structuredlearningofhumaninteractionsintvshows AT marszalekm structuredlearningofhumaninteractionsintvshows AT reidi structuredlearningofhumaninteractionsintvshows AT zissermana structuredlearningofhumaninteractionsintvshows

Structured learning of human interactions in TV shows

Benzer Materyaller