Context-aware pedestrian motion prediction

Pedestrian motion prediction can enhance the effectiveness of Advanced Driver-Assistance Systems (ADAS), autonomous driving, and robotic navigation to maintain pedestrians safety. Pedestrian motion is usually guided by an intention to reach a target place, and pedestrians navig...

Full description

Bibliographic Details
Main Author: Haddad, Sirin
Other Authors: Lam Siew Kei
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/151544
_version_ 1811689883300790272
author Haddad, Sirin
author2 Lam Siew Kei
author_facet Lam Siew Kei
Haddad, Sirin
author_sort Haddad, Sirin
collection NTU
description Pedestrian motion prediction can enhance the effectiveness of Advanced Driver-Assistance Systems (ADAS), autonomous driving, and robotic navigation to maintain pedestrians safety. Pedestrian motion is usually guided by an intention to reach a target place, and pedestrians navigate by making their motion decisions concerning the surrounding space. Numerous approaches captured pedestrian motion by observing their walking trajectory as an essential feature for future motion prediction. However, predicting a pedestrian trajectory in crowded environments is non-trivial due to the uncertainty of pedestrian intentions. This uncertainty is influenced by the pedestrians interaction with static structures and other dynamic objects present in the scene. As such, an accurate and plausible method to predict pedestrian motion in urban environments is still an unsolved problem.The objective of this Ph.D. research is to develop a robust and scalable vision-based framework for predicting pedestrian motion in urban environments. The proposed framework relies on graph-based deep prediction models that learn from pedestrians’ past motion, their surrounding context, and interactions to estimate their future trajectories. The framework models the surrounding environment by taking into account the contextual information consisting of other pedestrians and any fixed subjects present in the navigable area. This includes the social interactions among pedestrians and their interaction with the static settings, which forma dynamic context and play a significant role in determining pedestrian movement.In addition, other pedestrian cues and body features are collected to provide a stronger indication about pedestrian motion and increase the estimation confidence. Finally, in order to achieve high-speed prediction on embedded platforms with tight computational resources, low-complexity methods are considered.The thesis propose four approaches based on the spatio-temporal graphs and deploy Long Short-Term Memory (LSTM) network for predicting pedestrian trajectory in crowded environments. Chapter 3 presents a graph-based modeling approach that considers the pedestrians contextual interaction with the static scene structure and other obstacles (physical objects), and social interaction with dynamic elements(other pedestrians) in the scene. The proposed spatio-temporal graphs capture interactions at several spatial scopes, ranging from locally-spatial contextual interactions to global interactions of all pedestrians. In addition, a spatio-temporal attention mechanism is incorporated to quantify pedestrians mutual influence on each other and apply importance to each interaction. However, this approach yields a large static graph and the underlying graph structure needs to be set a-priori.To improve the scalability of spatio-temporal graphs, Chapter 4 presents SGTV,an adaptive spatio-temporal graph structure, coined as a ”Self-Growing Graph”. A centralized model was considered for encoding the entire graph at a single step and predicting trajectories simultaneously. As such, the contextual and interactions modeling become dynamic and adaptive to the temporal changes in the environment. The dynamic spatio-temporal graph addresses the scalability problem and experiment results demonstrate that SGTV can cater to crowds of up to 70 pedestrians with a running time of 0.75 seconds, while other baselines take 7x longer.Chapter 5 presents G2K, which improves the robustness of the self-growing graph approach by enriching the contextual modeling with more social features from pedestrians. The social cues are encoded simultaneously over time using a multi-dimensional encoder cell. The experiment results show that incorporating the pedestrian head pose into the contextual modeling leads to more accurate predictions across benchmark datasets, while maintaining reasonable scalability under alight weight graph structure.Finally, Chapter 6 presents STR-GGRNN, a scalable and robust pedestrian trajectory framework that integrates the designs in the previous chapters. This method deploys the multi-dimensional encoder with a variational sampling concept to achieve better results. Experiments on widely-used datasets show that the pro-posed framework outperforms the state-of-the-art methods. In particular, it yields a significant reduction in the Average Displacement Error (ADE) and the Final Displacement Error (FDE) of about 12cm and 15cm for the ETH-UCY datasets.For the Stanford Drone Dataset, it achieves 0.05 ADE and 0.07 FDE in meters precision. The proposed STR-GGRNN framework takes only about 2.30 seconds to predict the trajectories of 20 pedestrians.
first_indexed 2024-10-01T05:55:10Z
format Thesis-Doctor of Philosophy
id ntu-10356/151544
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:55:10Z
publishDate 2021
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1515442021-07-08T16:01:19Z Context-aware pedestrian motion prediction Haddad, Sirin Lam Siew Kei School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) ASSKLam@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Pedestrian motion prediction can enhance the effectiveness of Advanced Driver-Assistance Systems (ADAS), autonomous driving, and robotic navigation to maintain pedestrians safety. Pedestrian motion is usually guided by an intention to reach a target place, and pedestrians navigate by making their motion decisions concerning the surrounding space. Numerous approaches captured pedestrian motion by observing their walking trajectory as an essential feature for future motion prediction. However, predicting a pedestrian trajectory in crowded environments is non-trivial due to the uncertainty of pedestrian intentions. This uncertainty is influenced by the pedestrians interaction with static structures and other dynamic objects present in the scene. As such, an accurate and plausible method to predict pedestrian motion in urban environments is still an unsolved problem.The objective of this Ph.D. research is to develop a robust and scalable vision-based framework for predicting pedestrian motion in urban environments. The proposed framework relies on graph-based deep prediction models that learn from pedestrians’ past motion, their surrounding context, and interactions to estimate their future trajectories. The framework models the surrounding environment by taking into account the contextual information consisting of other pedestrians and any fixed subjects present in the navigable area. This includes the social interactions among pedestrians and their interaction with the static settings, which forma dynamic context and play a significant role in determining pedestrian movement.In addition, other pedestrian cues and body features are collected to provide a stronger indication about pedestrian motion and increase the estimation confidence. Finally, in order to achieve high-speed prediction on embedded platforms with tight computational resources, low-complexity methods are considered.The thesis propose four approaches based on the spatio-temporal graphs and deploy Long Short-Term Memory (LSTM) network for predicting pedestrian trajectory in crowded environments. Chapter 3 presents a graph-based modeling approach that considers the pedestrians contextual interaction with the static scene structure and other obstacles (physical objects), and social interaction with dynamic elements(other pedestrians) in the scene. The proposed spatio-temporal graphs capture interactions at several spatial scopes, ranging from locally-spatial contextual interactions to global interactions of all pedestrians. In addition, a spatio-temporal attention mechanism is incorporated to quantify pedestrians mutual influence on each other and apply importance to each interaction. However, this approach yields a large static graph and the underlying graph structure needs to be set a-priori.To improve the scalability of spatio-temporal graphs, Chapter 4 presents SGTV,an adaptive spatio-temporal graph structure, coined as a ”Self-Growing Graph”. A centralized model was considered for encoding the entire graph at a single step and predicting trajectories simultaneously. As such, the contextual and interactions modeling become dynamic and adaptive to the temporal changes in the environment. The dynamic spatio-temporal graph addresses the scalability problem and experiment results demonstrate that SGTV can cater to crowds of up to 70 pedestrians with a running time of 0.75 seconds, while other baselines take 7x longer.Chapter 5 presents G2K, which improves the robustness of the self-growing graph approach by enriching the contextual modeling with more social features from pedestrians. The social cues are encoded simultaneously over time using a multi-dimensional encoder cell. The experiment results show that incorporating the pedestrian head pose into the contextual modeling leads to more accurate predictions across benchmark datasets, while maintaining reasonable scalability under alight weight graph structure.Finally, Chapter 6 presents STR-GGRNN, a scalable and robust pedestrian trajectory framework that integrates the designs in the previous chapters. This method deploys the multi-dimensional encoder with a variational sampling concept to achieve better results. Experiments on widely-used datasets show that the pro-posed framework outperforms the state-of-the-art methods. In particular, it yields a significant reduction in the Average Displacement Error (ADE) and the Final Displacement Error (FDE) of about 12cm and 15cm for the ETH-UCY datasets.For the Stanford Drone Dataset, it achieves 0.05 ADE and 0.07 FDE in meters precision. The proposed STR-GGRNN framework takes only about 2.30 seconds to predict the trajectories of 20 pedestrians. Doctor of Philosophy 2021-06-25T00:28:09Z 2021-06-25T00:28:09Z 2021 Thesis-Doctor of Philosophy Haddad, S. (2021). Context-aware pedestrian motion prediction. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/151544 https://hdl.handle.net/10356/151544 10.32657/10356/151544 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Haddad, Sirin
Context-aware pedestrian motion prediction
title Context-aware pedestrian motion prediction
title_full Context-aware pedestrian motion prediction
title_fullStr Context-aware pedestrian motion prediction
title_full_unstemmed Context-aware pedestrian motion prediction
title_short Context-aware pedestrian motion prediction
title_sort context aware pedestrian motion prediction
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
url https://hdl.handle.net/10356/151544
work_keys_str_mv AT haddadsirin contextawarepedestrianmotionprediction