One-shot visual appearance learning for mobile manipulation

We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments base...

Full description

Bibliographic Details
Main Authors: Walter, Matthew R., Friedman, Yuli, Antone, Matthew, Teller, Seth
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Sage Publications 2012
Online Access:http://hdl.handle.net/1721.1/73543
_version_ 1811068484361125888
author Walter, Matthew R.
Friedman, Yuli
Antone, Matthew
Teller, Seth
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Walter, Matthew R.
Friedman, Yuli
Antone, Matthew
Teller, Seth
author_sort Walter, Matthew R.
collection MIT
description We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments based upon a single training example. The primary difficulty lies in achieving an effective reacquisition capability that is robust to the effects of local clutter, lighting variation, and object relocation. We overcome these challenges through an adaptive detection algorithm that automatically generates multiple-view appearance models for each object online. As the robot navigates within the environment and the object is detected from different viewpoints, the one-shot learner opportunistically and automatically incorporates additional observations into each model. In order to overcome the effects of ‘drift’ common to adaptive learners, the algorithm imposes simple requirements on the geometric consistency of candidate observations. Motivating our reacquisition strategy is our work developing a mobile manipulator that interprets and autonomously performs commands conveyed by a human user. The ability to detect specific objects and reconstitute the user’s segmentation hints enables the robot to be situationally aware. This situational awareness enables rich command and control mechanisms and affords natural interaction. We demonstrate one such capability that allows the human to give the robot a ‘guided tour’ of named objects within an outdoor environment and, hours later, to direct the robot to manipulate those objects by name using spoken instructions. We implemented our appearance-based detection strategy on our robotic manipulator as it operated over multiple days in different outdoor environments. We evaluate the algorithm’s performance under challenging conditions that include scene clutter, lighting and viewpoint variation, object ambiguity, and object relocation. The results demonstrate a reacquisition capability that is effective in real-world settings.
first_indexed 2024-09-23T07:56:39Z
format Article
id mit-1721.1/73543
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T07:56:39Z
publishDate 2012
publisher Sage Publications
record_format dspace
spelling mit-1721.1/735432022-09-30T01:11:41Z One-shot visual appearance learning for mobile manipulation Walter, Matthew R. Friedman, Yuli Antone, Matthew Teller, Seth Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Walter, Matthew R. Teller, Seth We describe a vision-based algorithm that enables a robot to robustly detect specific objects in a scene following an initial segmentation hint from a human user. The novelty lies in the ability to ‘reacquire’ objects over extended spatial and temporal excursions within challenging environments based upon a single training example. The primary difficulty lies in achieving an effective reacquisition capability that is robust to the effects of local clutter, lighting variation, and object relocation. We overcome these challenges through an adaptive detection algorithm that automatically generates multiple-view appearance models for each object online. As the robot navigates within the environment and the object is detected from different viewpoints, the one-shot learner opportunistically and automatically incorporates additional observations into each model. In order to overcome the effects of ‘drift’ common to adaptive learners, the algorithm imposes simple requirements on the geometric consistency of candidate observations. Motivating our reacquisition strategy is our work developing a mobile manipulator that interprets and autonomously performs commands conveyed by a human user. The ability to detect specific objects and reconstitute the user’s segmentation hints enables the robot to be situationally aware. This situational awareness enables rich command and control mechanisms and affords natural interaction. We demonstrate one such capability that allows the human to give the robot a ‘guided tour’ of named objects within an outdoor environment and, hours later, to direct the robot to manipulate those objects by name using spoken instructions. We implemented our appearance-based detection strategy on our robotic manipulator as it operated over multiple days in different outdoor environments. We evaluate the algorithm’s performance under challenging conditions that include scene clutter, lighting and viewpoint variation, object ambiguity, and object relocation. The results demonstrate a reacquisition capability that is effective in real-world settings. United States. Air Force (Contract FA8721-05-C-0002) 2012-10-02T14:57:19Z 2012-10-02T14:57:19Z 2012-04 Article http://purl.org/eprint/type/JournalArticle 0278-3649 1741-3176 http://hdl.handle.net/1721.1/73543 Walter, M. R. et al. “One-shot Visual Appearance Learning for Mobile Manipulation.” The International Journal of Robotics Research 31.4 (2012): 554–567. en_US http://dx.doi.org/10.1177/0278364911435515 International Journal of Robotics Research Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Sage Publications MIT web domain
spellingShingle Walter, Matthew R.
Friedman, Yuli
Antone, Matthew
Teller, Seth
One-shot visual appearance learning for mobile manipulation
title One-shot visual appearance learning for mobile manipulation
title_full One-shot visual appearance learning for mobile manipulation
title_fullStr One-shot visual appearance learning for mobile manipulation
title_full_unstemmed One-shot visual appearance learning for mobile manipulation
title_short One-shot visual appearance learning for mobile manipulation
title_sort one shot visual appearance learning for mobile manipulation
url http://hdl.handle.net/1721.1/73543
work_keys_str_mv AT waltermatthewr oneshotvisualappearancelearningformobilemanipulation
AT friedmanyuli oneshotvisualappearancelearningformobilemanipulation
AT antonematthew oneshotvisualappearancelearningformobilemanipulation
AT tellerseth oneshotvisualappearancelearningformobilemanipulation