Oxford/IIIT TRECVID 2008 – notebook paper

The Oxford/IIIT team participated in the high-level feature extraction and interactive search tasks. A vision only approach was used for both tasks, with no use of the text or audio information. The Oxford/IIIT team participated in the high-level feature extraction and interactive search tasks. A v...

Full description

Bibliographic Details
Main Authors: Philbin, J, Marin-Jimenez, M, Srinivasan, S, Zisserman, A, Jain, M, Vempati, S, Sankar, P, Jawahar, CV
Format: Conference item
Language:English
Published: National Institute of Standards and Technology 2008
Description
Summary:The Oxford/IIIT team participated in the high-level feature extraction and interactive search tasks. A vision only approach was used for both tasks, with no use of the text or audio information. The Oxford/IIIT team participated in the high-level feature extraction and interactive search tasks. A vision only approach was used for both tasks, with no use of the text or audio information. <br>For the high-level feature extraction task, we used two different approaches, both based on a combination of visual features. One used a SVM classifier using a linear combination of kernels, the other used a random forest classifier. For both methods, we trained all high-level features using publicly available annotations [3]. The advantage of the random forest classifier is the speed of training and testing. <br>In addition, for the people feature, we took a more targeted approach. We used a real-time face detector and an upper body detector, in both cases running on every frame. Our best performing submission, C_OXVGG_1_1, which used a rank fusion of our random forest and SVM approach, achieved an mAP of 0.101 and was above the median for all but one feature. <br>In the interactive search task, our team came third overall with an mAP of 0.158. The system used was identical to last year with the only change being a source of accurate upper body detections.