Headwear detection and retrieval with region-based convolutional neural networks

The object detection algorithm, region-based convolutional neural network (RCNN), is very popular in recent years. It boosts the performance significantly by making a combination of two key insights. The first one is to localize and segment objects by applying high-capacity convolutional neural netw...

Full description

Bibliographic Details
Main Author: Lyu, Shuen
Other Authors: Yap Kim Hui
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/68977
Description
Summary:The object detection algorithm, region-based convolutional neural network (RCNN), is very popular in recent years. It boosts the performance significantly by making a combination of two key insights. The first one is to localize and segment objects by applying high-capacity convolutional neural network to bottom-up region proposals. The second one is to apply supervised pre-training followed by domain-specific fine-tuning to training large convolutional neural networks when the labeled training data is insufficient [1]. We train a new model based on Headwear dataset collected by ourselves from google website.In this thesis, we firstly introduce some relevant concepts. We describe the selective search method used for region proposal, and the compositions of building blocks of convolutional neural network (CNN) that include convolutional layer, pooling layer, rectified linear units (ReLU), fully-connected layer and loss layer. We detail the process of dataset preparation that consists of data collection, data labeling and format transformation, which is considered as important preparation work for training process. We use eight categories with about 200 labeled images each category to perform the experiment. The core component of this research is the establishment of the object detection system from the module design to the model testing and validation, which is presented in the chapter 5 in great detail.The mean average precision (mAP) based on Headwear dataset is 47.87%. For Tudung and Turban, the average precision (AP) can reach up to 93.21% and 80.08% respectively, which are much higher than those of other six categories. The AP of Man_Hat and Safety_Helmet are 66.25% and 43.06%, while the AP of other four categories, Woman_Cap, Man_Songkok, Beret and Man_Cap, are all relatively lower than the mAP.