Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving

Abstract Predicting the future behaviour of neighbouring agents is crucial for autonomous driving. This task is challenging, largely because of the diverse unobservable intent of each agent which is further complicated by the complex interaction possibilities between them. The authors propose a mult...

Full description

Bibliographic Details
Main Authors: Baotian He, Yibing Li
Format: Article
Language:English
Published: Wiley 2022-09-01
Series:IET Intelligent Transport Systems
Online Access:https://doi.org/10.1049/itr2.12207
Description
Summary:Abstract Predicting the future behaviour of neighbouring agents is crucial for autonomous driving. This task is challenging, largely because of the diverse unobservable intent of each agent which is further complicated by the complex interaction possibilities between them. The authors propose a multi‐future Transformer framework that implicitly models the multi‐modal joint distribution by capturing the diverse interaction modes of the scene. To this end, a parallel interaction module is constructed, whereby each interaction block learns the joint agent–agent and agent–map interactions for possible future evolution. The model can perform likelihood estimation from the perspective of both the joint distribution of the scene and marginal distribution of each agent. Combined with the proposed scene‐level winner‐take‐all loss strategy complementary to the model architecture, the best performance is achieved for both target agent prediction and scene prediction tasks in a single model. To better utilise the scene context, comprehensive control experiments were conducted highlighting the importance of fine‐grained scene representation with content‐adaptive aggregation and late fusion of semantic attributes. The method, evaluated on the popular Argoverse forecasting dataset, outperformed previous methods while maintaining low model complexity.
ISSN:1751-956X
1751-9578