Summary: | Our team participated in the “light” version of the semantic indexing task. All runs used a combination of an image-level dense visual words classifier and an object-level part based detector. For each of the ten features, these two methods were ranked based on their performance on a validation set and associated to successive runs by decreasing performance (we also used a number of different techniques to recombine the scores). The two methods yielded a significantly different performance depending on the feature, as expected by their design: The χ<sup>2</sup>-SVM can be used for all feature types, including scene-like features such as Cityscape, Nighttime, Singing, but is outperformed by the object detector for object-like features, such as Boat or ship, Bus, and Person riding a bicycle.
<br>
Our team did not participate in the collaborative annotation effort. Instead, annotations were carried out internally for all the ten features to control quality and keyframe extraction, and to obtain region-of-interest annotations to train the object detectors. Compared to last year, the image-level classifier was significantly faster due to the use of a fast dense SIFT feature extractor and of an explicit feature map to approximate the χ<sup>2</sup> kernel SVM.
|