Supervised framework for automatic recognition and retrieval of interaction: a framework for classification and retrieving videos with similar human interactions

This study presents supervised framework for automatic recognition and retrieval of interactions (SAFARRIs), a supervised learning framework to recognise interactions such as pushing, punching, and hugging, between a pair of human performers in a video shot. The primary contribution of the study is...

Full description

Bibliographic Details
Main Authors: Chiranjoy Chattopadhyay, Sukhendu Das
Format: Article
Language:English
Published: Wiley 2016-04-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2015.0189
Description
Summary:This study presents supervised framework for automatic recognition and retrieval of interactions (SAFARRIs), a supervised learning framework to recognise interactions such as pushing, punching, and hugging, between a pair of human performers in a video shot. The primary contribution of the study is to extend the vectors of locally aggregated descriptors (VLADs) as a compact and discriminative video encoding representation, to solve the complex class partitioning problem of recognising human interaction. An initial codebook is generated from the training set of video shots, by extracting feature descriptors around the spatiotemporal interest points computed across frames. A bag of action words is generated by encoding the first‐order statistics of the visual words using VLAD. Support vector machine classifiers (1 against all) are trained using these codebooks. The authors have verified SAFARRI's accuracy for classification and retrieval (query by example). SAFARRI is free from tracking or recognition of body parts and capable of identifying the region of interaction in video shots. It gives superior retrieval and classification performances over recently proposed methods, on two publicly available human interaction datasets.
ISSN:1751-9632
1751-9640