One-shot visual appearance learning for mobile manipulation

We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments base...

Full description

Bibliographic Details
Main Authors:	Walter, Matthew R., Friedman, Yuli, Antone, Matthew, Teller, Seth
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Sage Publications 2012
Online Access:	http://hdl.handle.net/1721.1/73543

_version_	1811068484361125888
author	Walter, Matthew R. Friedman, Yuli Antone, Matthew Teller, Seth
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Walter, Matthew R. Friedman, Yuli Antone, Matthew Teller, Seth
author_sort	Walter, Matthew R.
collection	MIT
description	We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments based upon a single training example. The primary difficulty lies in achieving an effective reacquisition capability that is robust to the effects of local clutter, lighting variation, and object relocation. We overcome these challenges through an adaptive detection algorithm that automatically generates multiple-view appearance models for each object online. As the robot navigates within the environment and the object is detected from different viewpoints, the one-shot learner opportunistically and automatically incorporates additional observations into each model. In order to overcome the effects of ‘drift’ common to adaptive learners, the algorithm imposes simple requirements on the geometric consistency of candidate observations. Motivating our reacquisition strategy is our work developing a mobile manipulator that interprets and autonomously performs commands conveyed by a human user. The ability to detect specific objects and reconstitute the user’s segmentation hints enables the robot to be situationally aware. This situational awareness enables rich command and control mechanisms and affords natural interaction. We demonstrate one such capability that allows the human to give the robot a ‘guided tour’ of named objects within an outdoor environment and, hours later, to direct the robot to manipulate those objects by name using spoken instructions. We implemented our appearance-based detection strategy on our robotic manipulator as it operated over multiple days in different outdoor environments. We evaluate the algorithm’s performance under challenging conditions that include scene clutter, lighting and viewpoint variation, object ambiguity, and object relocation. The results demonstrate a reacquisition capability that is effective in real-world settings.
first_indexed	2024-09-23T07:56:39Z
format	Article
id	mit-1721.1/73543
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T07:56:39Z
publishDate	2012
publisher	Sage Publications
record_format	dspace
spelling	mit-1721.1/735432022-09-30T01:11:41Z One-shot visual appearance learning for mobile manipulation Walter, Matthew R. Friedman, Yuli Antone, Matthew Teller, Seth Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Walter, Matthew R. Teller, Seth We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments based upon a single training example. The primary difficulty lies in achieving an effective reacquisition capability that is robust to the effects of local clutter, lighting variation, and object relocation. We overcome these challenges through an adaptive detection algorithm that automatically generates multiple-view appearance models for each object online. As the robot navigates within the environment and the object is detected from different viewpoints, the one-shot learner opportunistically and automatically incorporates additional observations into each model. In order to overcome the effects of ‘drift’ common to adaptive learners, the algorithm imposes simple requirements on the geometric consistency of candidate observations. Motivating our reacquisition strategy is our work developing a mobile manipulator that interprets and autonomously performs commands conveyed by a human user. The ability to detect specific objects and reconstitute the user’s segmentation hints enables the robot to be situationally aware. This situational awareness enables rich command and control mechanisms and affords natural interaction. We demonstrate one such capability that allows the human to give the robot a ‘guided tour’ of named objects within an outdoor environment and, hours later, to direct the robot to manipulate those objects by name using spoken instructions. We implemented our appearance-based detection strategy on our robotic manipulator as it operated over multiple days in different outdoor environments. We evaluate the algorithm’s performance under challenging conditions that include scene clutter, lighting and viewpoint variation, object ambiguity, and object relocation. The results demonstrate a reacquisition capability that is effective in real-world settings. United States. Air Force (Contract FA8721-05-C-0002) 2012-10-02T14:57:19Z 2012-10-02T14:57:19Z 2012-04 Article http://purl.org/eprint/type/JournalArticle 0278-3649 1741-3176 http://hdl.handle.net/1721.1/73543 Walter, M. R. et al. “One-shot Visual Appearance Learning for Mobile Manipulation.” The International Journal of Robotics Research 31.4 (2012): 554–567. en_US http://dx.doi.org/10.1177/0278364911435515 International Journal of Robotics Research Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Sage Publications MIT web domain
spellingShingle	Walter, Matthew R. Friedman, Yuli Antone, Matthew Teller, Seth One-shot visual appearance learning for mobile manipulation
title	One-shot visual appearance learning for mobile manipulation
title_full	One-shot visual appearance learning for mobile manipulation
title_fullStr	One-shot visual appearance learning for mobile manipulation
title_full_unstemmed	One-shot visual appearance learning for mobile manipulation
title_short	One-shot visual appearance learning for mobile manipulation
title_sort	one shot visual appearance learning for mobile manipulation
url	http://hdl.handle.net/1721.1/73543
work_keys_str_mv	AT waltermatthewr oneshotvisualappearancelearningformobilemanipulation AT friedmanyuli oneshotvisualappearancelearningformobilemanipulation AT antonematthew oneshotvisualappearancelearningformobilemanipulation AT tellerseth oneshotvisualappearancelearningformobilemanipulation

One-shot visual appearance learning for mobile manipulation

Similar Items