Learning deep visual object models from noisy web data: How to make it work

Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bi...

Full description

Bibliographic Details
Main Authors:	Massouh, N, Babiloni, F, Tommasi, T, Young, J, Hawes, N, Caputo, B
Format:	Journal article
Published:	IEEE 2017

_version_	1797050975997919232
author	Massouh, N Babiloni, F Tommasi, T Young, J Hawes, N Caputo, B
author_facet	Massouh, N Babiloni, F Tommasi, T Young, J Hawes, N Caputo, B
author_sort	Massouh, N
collection	OXFORD
description	Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.
first_indexed	2024-03-06T18:13:12Z
format	Journal article
id	oxford-uuid:03b960ec-4922-4d2e-8ea4-4705a572f2c8
institution	University of Oxford
last_indexed	2024-03-06T18:13:12Z
publishDate	2017
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:03b960ec-4922-4d2e-8ea4-4705a572f2c82022-03-26T08:47:51ZLearning deep visual object models from noisy web data: How to make it workJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:03b960ec-4922-4d2e-8ea4-4705a572f2c8Symplectic Elements at OxfordIEEE2017Massouh, NBabiloni, FTommasi, TYoung, JHawes, NCaputo, BDeep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.
spellingShingle	Massouh, N Babiloni, F Tommasi, T Young, J Hawes, N Caputo, B Learning deep visual object models from noisy web data: How to make it work
title	Learning deep visual object models from noisy web data: How to make it work
title_full	Learning deep visual object models from noisy web data: How to make it work
title_fullStr	Learning deep visual object models from noisy web data: How to make it work
title_full_unstemmed	Learning deep visual object models from noisy web data: How to make it work
title_short	Learning deep visual object models from noisy web data: How to make it work
title_sort	learning deep visual object models from noisy web data how to make it work
work_keys_str_mv	AT massouhn learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork AT babilonif learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork AT tommasit learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork AT youngj learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork AT hawesn learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork AT caputob learningdeepvisualobjectmodelsfromnoisywebdatahowtomakeitwork

Learning deep visual object models from noisy web data: How to make it work

Similar Items