Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving

Abstract Predicting the future behaviour of neighbouring agents is crucial for autonomous driving. This task is challenging, largely because of the diverse unobservable intent of each agent which is further complicated by the complex interaction possibilities between them. The authors propose a mult...

Full description

Bibliographic Details
Main Authors: Baotian He, Yibing Li
Format: Article
Language:English
Published: Wiley 2022-09-01
Series:IET Intelligent Transport Systems
Online Access:https://doi.org/10.1049/itr2.12207
_version_ 1797995427633561600
author Baotian He
Yibing Li
author_facet Baotian He
Yibing Li
author_sort Baotian He
collection DOAJ
description Abstract Predicting the future behaviour of neighbouring agents is crucial for autonomous driving. This task is challenging, largely because of the diverse unobservable intent of each agent which is further complicated by the complex interaction possibilities between them. The authors propose a multi‐future Transformer framework that implicitly models the multi‐modal joint distribution by capturing the diverse interaction modes of the scene. To this end, a parallel interaction module is constructed, whereby each interaction block learns the joint agent–agent and agent–map interactions for possible future evolution. The model can perform likelihood estimation from the perspective of both the joint distribution of the scene and marginal distribution of each agent. Combined with the proposed scene‐level winner‐take‐all loss strategy complementary to the model architecture, the best performance is achieved for both target agent prediction and scene prediction tasks in a single model. To better utilise the scene context, comprehensive control experiments were conducted highlighting the importance of fine‐grained scene representation with content‐adaptive aggregation and late fusion of semantic attributes. The method, evaluated on the popular Argoverse forecasting dataset, outperformed previous methods while maintaining low model complexity.
first_indexed 2024-04-11T10:01:28Z
format Article
id doaj.art-1f2c5a5cbe164b97a8292956987e0f77
institution Directory Open Access Journal
issn 1751-956X
1751-9578
language English
last_indexed 2024-04-11T10:01:28Z
publishDate 2022-09-01
publisher Wiley
record_format Article
series IET Intelligent Transport Systems
spelling doaj.art-1f2c5a5cbe164b97a8292956987e0f772022-12-22T04:30:24ZengWileyIET Intelligent Transport Systems1751-956X1751-95782022-09-011691249126710.1049/itr2.12207Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous drivingBaotian He0Yibing Li1School of Vehicle and Mobility Tsinghua University Beijing ChinaSchool of Vehicle and Mobility Tsinghua University Beijing ChinaAbstract Predicting the future behaviour of neighbouring agents is crucial for autonomous driving. This task is challenging, largely because of the diverse unobservable intent of each agent which is further complicated by the complex interaction possibilities between them. The authors propose a multi‐future Transformer framework that implicitly models the multi‐modal joint distribution by capturing the diverse interaction modes of the scene. To this end, a parallel interaction module is constructed, whereby each interaction block learns the joint agent–agent and agent–map interactions for possible future evolution. The model can perform likelihood estimation from the perspective of both the joint distribution of the scene and marginal distribution of each agent. Combined with the proposed scene‐level winner‐take‐all loss strategy complementary to the model architecture, the best performance is achieved for both target agent prediction and scene prediction tasks in a single model. To better utilise the scene context, comprehensive control experiments were conducted highlighting the importance of fine‐grained scene representation with content‐adaptive aggregation and late fusion of semantic attributes. The method, evaluated on the popular Argoverse forecasting dataset, outperformed previous methods while maintaining low model complexity.https://doi.org/10.1049/itr2.12207
spellingShingle Baotian He
Yibing Li
Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
IET Intelligent Transport Systems
title Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
title_full Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
title_fullStr Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
title_full_unstemmed Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
title_short Multi‐future Transformer: Learning diverse interaction modes for behaviour prediction in autonomous driving
title_sort multi future transformer learning diverse interaction modes for behaviour prediction in autonomous driving
url https://doi.org/10.1049/itr2.12207
work_keys_str_mv AT baotianhe multifuturetransformerlearningdiverseinteractionmodesforbehaviourpredictioninautonomousdriving
AT yibingli multifuturetransformerlearningdiverseinteractionmodesforbehaviourpredictioninautonomousdriving