Generative models for robotic arm motion

Currently, human-computer interaction technology is rapidly advancing, with one popular topic being the imitation of human arm movements by robotic arms to achieve visual behavior replication. With continuous technological progress, robotic arms are now able of learning and mimicking movements by vi...

Full description

Bibliographic Details
Main Author: Xu, Wenjie
Other Authors: Wen Bihan
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180161
Description
Summary:Currently, human-computer interaction technology is rapidly advancing, with one popular topic being the imitation of human arm movements by robotic arms to achieve visual behavior replication. With continuous technological progress, robotic arms are now able of learning and mimicking movements by videos or images given to them. This paper proposes three methods that utilize cross domain transformation and image generation techniques to convert videos of human arm movements into robotic arm action videos. These methods enable real robotic arms to imitate the movements, making progress to the development of interaction between human and machine. We process the videos into frames and use generative adversarial networks and a contrastive learning framework to maximize the mutual information between image patches in the input and output domains, thereby effectively achieving cross-domain transformation. Additionally, the paper explores the approach of direct video-to-video transformation. By employing a Stable Diffusion text-to image model combined with spatio-temporal attention mechanisms, the generated videos align well with the commands given to the model. This not only broadens the application scope of the model but also provides insights into tasks involving cross-domain transformation, opening up more possibilities for robotic arm learning.