Fine-Tuning Multimodal Transformer Models for Generating Actions in Virtual and Real Environments

In this work, we propose and investigate an original approach to using a pre-trained multimodal transformer of a specialized architecture for controlling a robotic agent in an object manipulation task based on language instruction, which we refer to as RozumFormer. Our model is based on a bimodal (t...

Full description

Bibliographic Details
Main Authors: Aleksei Staroverov, Andrey S. Gorodetsky, Andrei S. Krishtopik, Uliana A. Izmesteva, Dmitry A. Yudin, Alexey K. Kovalev, Aleksandr I. Panov
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10323309/