ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application
This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI’s ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition pro...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10235949/ |
_version_ | 1797688487622737920 |
---|---|
author | Naoki Wake Atsushi Kanehira Kazuhiro Sasabuchi Jun Takamatsu Katsushi Ikeuchi |
author_facet | Naoki Wake Atsushi Kanehira Kazuhiro Sasabuchi Jun Takamatsu Katsushi Ikeuchi |
author_sort | Naoki Wake |
collection | DOAJ |
description | This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI’s ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at <uri>https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts</uri>. |
first_indexed | 2024-03-12T01:32:40Z |
format | Article |
id | doaj.art-d310831d5296475aa84f1dabd9cfc6d2 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-12T01:32:40Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-d310831d5296475aa84f1dabd9cfc6d22023-09-11T23:02:08ZengIEEEIEEE Access2169-35362023-01-0111950609507810.1109/ACCESS.2023.331093510235949ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case ApplicationNaoki Wake0https://orcid.org/0000-0001-8278-2373Atsushi Kanehira1Kazuhiro Sasabuchi2https://orcid.org/0000-0002-5408-3089Jun Takamatsu3https://orcid.org/0000-0001-7457-2878Katsushi Ikeuchi4https://orcid.org/0000-0001-9758-9357Applied Robotics Research, Microsoft, Redmond, WA, USAApplied Robotics Research, Microsoft, Redmond, WA, USAApplied Robotics Research, Microsoft, Redmond, WA, USAApplied Robotics Research, Microsoft, Redmond, WA, USAApplied Robotics Research, Microsoft, Redmond, WA, USAThis paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI’s ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at <uri>https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts</uri>.https://ieeexplore.ieee.org/document/10235949/Task planningrobot manipulationlarge language modelsChatGPT |
spellingShingle | Naoki Wake Atsushi Kanehira Kazuhiro Sasabuchi Jun Takamatsu Katsushi Ikeuchi ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application IEEE Access Task planning robot manipulation large language models ChatGPT |
title | ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application |
title_full | ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application |
title_fullStr | ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application |
title_full_unstemmed | ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application |
title_short | ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application |
title_sort | chatgpt empowered long step robot control in various environments a case application |
topic | Task planning robot manipulation large language models ChatGPT |
url | https://ieeexplore.ieee.org/document/10235949/ |
work_keys_str_mv | AT naokiwake chatgptempoweredlongsteprobotcontrolinvariousenvironmentsacaseapplication AT atsushikanehira chatgptempoweredlongsteprobotcontrolinvariousenvironmentsacaseapplication AT kazuhirosasabuchi chatgptempoweredlongsteprobotcontrolinvariousenvironmentsacaseapplication AT juntakamatsu chatgptempoweredlongsteprobotcontrolinvariousenvironmentsacaseapplication AT katsushiikeuchi chatgptempoweredlongsteprobotcontrolinvariousenvironmentsacaseapplication |