ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

Abstract Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving (AD) system. However, most proposed methods aim at addressing one of the two challenges mentioned above with a single model. To tackle this dilemma, this pap...

Full description

Bibliographic Details
Main Authors:	Yang Fang, Bei Luo, Ting Zhao, Dong He, Bingbing Jiang, Qilie Liu
Format:	Article
Language:	English
Published:	Wiley 2022-12-01
Series:	CAAI Transactions on Intelligence Technology
Subjects:	feature fusion graph interaction hierarchical aggregation scene perception scene semantics trajectory forecasting
Online Access:	https://doi.org/10.1049/cit2.12145

_version_	1811189134457307136
author	Yang Fang Bei Luo Ting Zhao Dong He Bingbing Jiang Qilie Liu
author_facet	Yang Fang Bei Luo Ting Zhao Dong He Bingbing Jiang Qilie Liu
author_sort	Yang Fang
collection	DOAJ
description	Abstract Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving (AD) system. However, most proposed methods aim at addressing one of the two challenges mentioned above with a single model. To tackle this dilemma, this paper proposes spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting (ST‐SIGMA), an efficient end‐to‐end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework. ST‐SIGMA adopts a trident encoder–decoder architecture to learn scene semantics and agent interaction information on bird’s‐eye view (BEV) maps simultaneously. Specifically, an iterative aggregation network is first employed as the scene semantic encoder (SSE) to learn diverse scene information. To preserve dynamic interactions of traffic agents, ST‐SIGMA further exploits a spatio‐temporal graph network as the graph interaction encoder. Meanwhile, a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed. Extensive experiments on the nuScenes data set have demonstrated that the proposed ST‐SIGMA achieves significant improvements compared to the state‐of‐the‐art (SOTA) methods in terms of scene perception and trajectory forecasting, respectively. Therefore, the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in real‐world AD scenarios.
first_indexed	2024-04-11T14:29:44Z
format	Article
id	doaj.art-4b19a0242195487986043e4fb589b832
institution	Directory Open Access Journal
issn	2468-2322
language	English
last_indexed	2024-04-11T14:29:44Z
publishDate	2022-12-01
publisher	Wiley
record_format	Article
series	CAAI Transactions on Intelligence Technology
spelling	doaj.art-4b19a0242195487986043e4fb589b8322022-12-22T04:18:39ZengWileyCAAI Transactions on Intelligence Technology2468-23222022-12-017474475710.1049/cit2.12145ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecastingYang Fang0Bei Luo1Ting Zhao2Dong He3Bingbing Jiang4Qilie Liu5School of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing ChinaSchool of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing ChinaSchool of Communication and Information Engineering Chongqing University of Posts and Telecommunications Chongqing ChinaSchool of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Daejeon Republic of KoreaSchool of Information Science and Technology Hangzhou Normal University Hangzhou ChinaSchool of Communication and Information Engineering Chongqing University of Posts and Telecommunications Chongqing ChinaAbstract Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving (AD) system. However, most proposed methods aim at addressing one of the two challenges mentioned above with a single model. To tackle this dilemma, this paper proposes spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting (ST‐SIGMA), an efficient end‐to‐end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework. ST‐SIGMA adopts a trident encoder–decoder architecture to learn scene semantics and agent interaction information on bird’s‐eye view (BEV) maps simultaneously. Specifically, an iterative aggregation network is first employed as the scene semantic encoder (SSE) to learn diverse scene information. To preserve dynamic interactions of traffic agents, ST‐SIGMA further exploits a spatio‐temporal graph network as the graph interaction encoder. Meanwhile, a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed. Extensive experiments on the nuScenes data set have demonstrated that the proposed ST‐SIGMA achieves significant improvements compared to the state‐of‐the‐art (SOTA) methods in terms of scene perception and trajectory forecasting, respectively. Therefore, the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in real‐world AD scenarios.https://doi.org/10.1049/cit2.12145feature fusiongraph interactionhierarchical aggregationscene perceptionscene semanticstrajectory forecasting
spellingShingle	Yang Fang Bei Luo Ting Zhao Dong He Bingbing Jiang Qilie Liu ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting CAAI Transactions on Intelligence Technology feature fusion graph interaction hierarchical aggregation scene perception scene semantics trajectory forecasting
title	ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
title_full	ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
title_fullStr	ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
title_full_unstemmed	ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
title_short	ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
title_sort	st sigma spatio temporal semantics and interaction graph aggregation for multi agent perception and trajectory forecasting
topic	feature fusion graph interaction hierarchical aggregation scene perception scene semantics trajectory forecasting
url	https://doi.org/10.1049/cit2.12145
work_keys_str_mv	AT yangfang stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting AT beiluo stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting AT tingzhao stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting AT donghe stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting AT bingbingjiang stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting AT qilieliu stsigmaspatiotemporalsemanticsandinteractiongraphaggregationformultiagentperceptionandtrajectoryforecasting

ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

Similar Items