Accident risk assessment and prediction using surrogate indicators and machine learning

Road traffic accidents cause a great loss of lives and property damage. Reliable accident prediction and proactive prevention are undoubtedly of great benefit and necessity. This study focuses on the risk assessment and prediction of traffic accidents associated with vehicle conflicts, using machine...

Full description

Bibliographic Details
Main Author: Shi, Xiupeng
Other Authors: Wong Yiik Diew
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2019
Subjects:
Online Access:https://hdl.handle.net/10356/136550
_version_ 1811678355281412096
author Shi, Xiupeng
author2 Wong Yiik Diew
author_facet Wong Yiik Diew
Shi, Xiupeng
author_sort Shi, Xiupeng
collection NTU
description Road traffic accidents cause a great loss of lives and property damage. Reliable accident prediction and proactive prevention are undoubtedly of great benefit and necessity. This study focuses on the risk assessment and prediction of traffic accidents associated with vehicle conflicts, using machine learning and surrogate indicators to achieve vehicle-level risk rating and prediction based on instantaneous driving behaviours. Accident events are generally unexpected and occur rarely. Pre-crash risk assessment by surrogate indicators is an effective way to identify risk levels, and thus boost crash prediction. Herein, the concept of Key Risk Indicator (KRI) is proposed, which assesses risk exposures using hybrid indicators. To evaluate the feasibility of indicator-based risk assessment, a typical real-world chain-collision accident (on Singapore’s expressway) and its antecedent (pre-crash) road traffic movements are retrieved from surveillance video footage, and a grid remapping method is proposed for data extraction and coordinates transformation. Seven surrogate measures of traffic conflicts are assessed based on a temporal-spatial case-control comparison of which two surrogate measures are found to be more efficient in identifying pre-accident risk conditions, namely, Time Integrated Time-to-collision (TIT) and Crash Potential Index (CPI). Hence, the KRIs are formulated based on the hybrid of TIT and CPI, which hierarchically distinguish various risk levels. TIT enables the capture of risk signals (when TIT>0), while CPI further identifies the more severe ones (i.e. those conditions for near-crashes) (when CPI>0). Besides, the thresholds of risk levels in KRIs are more straightforward to define. For a rigorous validation, the results are examined by another independent real-world accident sample. Verified by real-world accidents, KRIs make a breakthrough in indicator-based risk assessment, and reveal new insights about pre-crash risk exposures. From another perspective, indicator-based risk assessment is extended to general traffic streams. The unsupervised vehicle-level risk rating is achieved by clustering and the extraction of risk indicator features. The risk grading pertains to a distinctly imbalanced problem, with some intrinsic challenges. Based on the findings in KRIs, a total of 12 risk indicator features are designed, which represent vehicle risk exposures in terms of temporal, kinematical and spatial aspects. To obtain reliable and robust partitioning of risk levels, an ensemble clustering model is built by majority voting of the risk labels produced by multiple clustering. The clustering is conducted on a large group of vehicles within a road segment. Based on pattern similarity, vehicles are clustered into distinct groups with graded risk labels. Clustering is performed in a progressive manner to obtain hierarchical partitioning of risk levels, which facilitates to identify the highest risk level. Moreover, label identification by classifiers is proposed to evaluate the clustering performance and determine the risk levels. Herein, vehicle trajectory data from the United States’ NGSIM Program is used as a case study, and risk grading with six levels is established. The risk indicator features based on TIT and CPI are found with higher importance, according to feature ranking by random forest. Besides, a high-resolution risk mapping and positioning is demonstrated to delineate the risk potentials, including at-risk vehicles, locations and timestamps, as well as risk patterns (e.g. severity, frequency, trends). The proposed method is found to be effective to assess detailed risk potentials inherent to driving behaviour as exhibited by the general vehicle trajectory, and generate unsupervised data labelling of risk levels. Furthermore, the linkages of risk levels and driving behaviours are explored, which empower behaviour-based risk prediction. An integrated feature learning framework is designed, to assess vehicle driving and predict risk levels. The framework integrates learning-based feature selection, unsupervised risk rating, and imbalanced data resampling. For each vehicle, about 1,300 driving behaviour features are extracted from trajectory data, which produce in-depth and multi-view measures on behaviours. To estimate risk potentials of vehicles driving on the roads, unsupervised risk rating is conducted using fuzzy C means (FCM), and four risk levels are used for data labelling. Besides, data under-sampling of the safe group is performed to reduce the risk-safe class imbalance. Afterwards, the linkages between behaviour features and corresponding risk levels are built using XGBoost, and key features are identified according to feature importance ranking and recursive elimination. The risk levels of vehicles in driving are predicted based on key features selected. As a case study, NGSIM trajectory data are used in which four risk levels are clustered, 64 key behaviour features are identified, and an overall accuracy of 91.66% is achieved for behaviour-based risk prediction. Findings show that this approach is effective and reliable to identify important features for driving assessment, and achieve an accurate prediction of risk levels. Finally, a domain-specific automated machine learning (AutoML) is built, which enables end-to-end learning from the driving behaviour data to detailed risk levels and corresponding key features. The AutoML assembles all necessary machine learning steps as an end-to-end pipeline and automates the pipeline to get the features, models, and hyperparameters that return the best performance as measured on validation sets. The AutoML platform has a self-learning and auto-optimisation mechanism, which can be easily updated by introducing the most advanced algorithms. Bayesian optimisation guides the self-learning of AutoML by effectively auto-tuning the hyperparameters and exploring the pipeline space for better performance. The identification of key features not only helps to produce better results with fewer computation costs, but also provides data-driven insights about system optimisation and sensing configuration. Application potentials are discussed, and the AutoML can be used in the risk decision-making and motion trajectory planning of autonomous vehicles (AVs) and ADAS (advanced driver assistance systems), pay-how-you-drive (PHYD) insurance, driving safety system under the connected vehicle environment, and short-term near-real-time crash prediction, among others. These studies contribute to traffic safety by providing a portfolio of techniques, ranging from vehicle-level crash risk prediction to personalised driving behaviour enhancement, which enables the development of effective measures and systems to reduce the likelihood of crashes.
first_indexed 2024-10-01T02:51:57Z
format Thesis-Doctor of Philosophy
id ntu-10356/136550
institution Nanyang Technological University
language English
last_indexed 2024-10-01T02:51:57Z
publishDate 2019
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1365502020-10-28T08:40:42Z Accident risk assessment and prediction using surrogate indicators and machine learning Shi, Xiupeng Wong Yiik Diew School of Civil and Environmental Engineering Centre for Infrastructure Systems cydwong@ntu.edu.sg Engineering::Civil engineering Road traffic accidents cause a great loss of lives and property damage. Reliable accident prediction and proactive prevention are undoubtedly of great benefit and necessity. This study focuses on the risk assessment and prediction of traffic accidents associated with vehicle conflicts, using machine learning and surrogate indicators to achieve vehicle-level risk rating and prediction based on instantaneous driving behaviours. Accident events are generally unexpected and occur rarely. Pre-crash risk assessment by surrogate indicators is an effective way to identify risk levels, and thus boost crash prediction. Herein, the concept of Key Risk Indicator (KRI) is proposed, which assesses risk exposures using hybrid indicators. To evaluate the feasibility of indicator-based risk assessment, a typical real-world chain-collision accident (on Singapore’s expressway) and its antecedent (pre-crash) road traffic movements are retrieved from surveillance video footage, and a grid remapping method is proposed for data extraction and coordinates transformation. Seven surrogate measures of traffic conflicts are assessed based on a temporal-spatial case-control comparison of which two surrogate measures are found to be more efficient in identifying pre-accident risk conditions, namely, Time Integrated Time-to-collision (TIT) and Crash Potential Index (CPI). Hence, the KRIs are formulated based on the hybrid of TIT and CPI, which hierarchically distinguish various risk levels. TIT enables the capture of risk signals (when TIT>0), while CPI further identifies the more severe ones (i.e. those conditions for near-crashes) (when CPI>0). Besides, the thresholds of risk levels in KRIs are more straightforward to define. For a rigorous validation, the results are examined by another independent real-world accident sample. Verified by real-world accidents, KRIs make a breakthrough in indicator-based risk assessment, and reveal new insights about pre-crash risk exposures. From another perspective, indicator-based risk assessment is extended to general traffic streams. The unsupervised vehicle-level risk rating is achieved by clustering and the extraction of risk indicator features. The risk grading pertains to a distinctly imbalanced problem, with some intrinsic challenges. Based on the findings in KRIs, a total of 12 risk indicator features are designed, which represent vehicle risk exposures in terms of temporal, kinematical and spatial aspects. To obtain reliable and robust partitioning of risk levels, an ensemble clustering model is built by majority voting of the risk labels produced by multiple clustering. The clustering is conducted on a large group of vehicles within a road segment. Based on pattern similarity, vehicles are clustered into distinct groups with graded risk labels. Clustering is performed in a progressive manner to obtain hierarchical partitioning of risk levels, which facilitates to identify the highest risk level. Moreover, label identification by classifiers is proposed to evaluate the clustering performance and determine the risk levels. Herein, vehicle trajectory data from the United States’ NGSIM Program is used as a case study, and risk grading with six levels is established. The risk indicator features based on TIT and CPI are found with higher importance, according to feature ranking by random forest. Besides, a high-resolution risk mapping and positioning is demonstrated to delineate the risk potentials, including at-risk vehicles, locations and timestamps, as well as risk patterns (e.g. severity, frequency, trends). The proposed method is found to be effective to assess detailed risk potentials inherent to driving behaviour as exhibited by the general vehicle trajectory, and generate unsupervised data labelling of risk levels. Furthermore, the linkages of risk levels and driving behaviours are explored, which empower behaviour-based risk prediction. An integrated feature learning framework is designed, to assess vehicle driving and predict risk levels. The framework integrates learning-based feature selection, unsupervised risk rating, and imbalanced data resampling. For each vehicle, about 1,300 driving behaviour features are extracted from trajectory data, which produce in-depth and multi-view measures on behaviours. To estimate risk potentials of vehicles driving on the roads, unsupervised risk rating is conducted using fuzzy C means (FCM), and four risk levels are used for data labelling. Besides, data under-sampling of the safe group is performed to reduce the risk-safe class imbalance. Afterwards, the linkages between behaviour features and corresponding risk levels are built using XGBoost, and key features are identified according to feature importance ranking and recursive elimination. The risk levels of vehicles in driving are predicted based on key features selected. As a case study, NGSIM trajectory data are used in which four risk levels are clustered, 64 key behaviour features are identified, and an overall accuracy of 91.66% is achieved for behaviour-based risk prediction. Findings show that this approach is effective and reliable to identify important features for driving assessment, and achieve an accurate prediction of risk levels. Finally, a domain-specific automated machine learning (AutoML) is built, which enables end-to-end learning from the driving behaviour data to detailed risk levels and corresponding key features. The AutoML assembles all necessary machine learning steps as an end-to-end pipeline and automates the pipeline to get the features, models, and hyperparameters that return the best performance as measured on validation sets. The AutoML platform has a self-learning and auto-optimisation mechanism, which can be easily updated by introducing the most advanced algorithms. Bayesian optimisation guides the self-learning of AutoML by effectively auto-tuning the hyperparameters and exploring the pipeline space for better performance. The identification of key features not only helps to produce better results with fewer computation costs, but also provides data-driven insights about system optimisation and sensing configuration. Application potentials are discussed, and the AutoML can be used in the risk decision-making and motion trajectory planning of autonomous vehicles (AVs) and ADAS (advanced driver assistance systems), pay-how-you-drive (PHYD) insurance, driving safety system under the connected vehicle environment, and short-term near-real-time crash prediction, among others. These studies contribute to traffic safety by providing a portfolio of techniques, ranging from vehicle-level crash risk prediction to personalised driving behaviour enhancement, which enables the development of effective measures and systems to reduce the likelihood of crashes. Doctor of Philosophy 2019-12-27T07:41:22Z 2019-12-27T07:41:22Z 2019 Thesis-Doctor of Philosophy Shi, X. (2019). Accident risk assessment and prediction using surrogate indicators and machine learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/136550 10.32657/10356/136550 en application/pdf Nanyang Technological University
spellingShingle Engineering::Civil engineering
Shi, Xiupeng
Accident risk assessment and prediction using surrogate indicators and machine learning
title Accident risk assessment and prediction using surrogate indicators and machine learning
title_full Accident risk assessment and prediction using surrogate indicators and machine learning
title_fullStr Accident risk assessment and prediction using surrogate indicators and machine learning
title_full_unstemmed Accident risk assessment and prediction using surrogate indicators and machine learning
title_short Accident risk assessment and prediction using surrogate indicators and machine learning
title_sort accident risk assessment and prediction using surrogate indicators and machine learning
topic Engineering::Civil engineering
url https://hdl.handle.net/10356/136550
work_keys_str_mv AT shixiupeng accidentriskassessmentandpredictionusingsurrogateindicatorsandmachinelearning