Prediction of high-speed train delay propagation based on causal text information

Abstract The delay-causing text data contain valuable information such as the specific reasons for the delay, location and time of the disturbance, which can provide an efficient support for the prediction of train delays and improve the guidance of train control efficiency. Based on the train opera...

Full description

Bibliographic Details
Main Authors: Qianyi Liu, Shengjie Wang, Zhongcan Li, Li Li, Jun Zhang, Chao Wen
Format: Article
Language:English
Published: SpringerOpen 2022-09-01
Series:Railway Engineering Science
Subjects:
Online Access:https://doi.org/10.1007/s40534-022-00286-x
_version_ 1797865696780091392
author Qianyi Liu
Shengjie Wang
Zhongcan Li
Li Li
Jun Zhang
Chao Wen
author_facet Qianyi Liu
Shengjie Wang
Zhongcan Li
Li Li
Jun Zhang
Chao Wen
author_sort Qianyi Liu
collection DOAJ
description Abstract The delay-causing text data contain valuable information such as the specific reasons for the delay, location and time of the disturbance, which can provide an efficient support for the prediction of train delays and improve the guidance of train control efficiency. Based on the train operation data and delay-causing data of the Wuhan–Guangzhou high-speed railway, the relevant algorithms in the natural language processing field are used to process the delay-causing text data. It also integrates the  train operating-environment information and delay-causing text information so as to develop a cause-based train delay propagation prediction model. The Word2vec model is first used to vectorize the delay-causing text description after word segmentation. The mean model or the term frequency-inverse document frequency-weighted model is then used to generate the delay-causing sentence vector based on the original word vector. Afterward, the  train operating-environment features and delay-causing sentence vector are input into the extreme gradient boosting (XGBoost) regression algorithm to develop a delay propagation prediction model. In this work, 4 text feature processing methods and 8 regression algorithms are considered. The results demonstrate that the XGBoost regression algorithm has the highest prediction accuracy using the test features processed by the continuous bag of words and the mean models. Compared with the prediction model that only considers the train-operating-environment features, the results show that the prediction accuracy of the model is significantly improved with multiple regression algorithms after integrating the delay-causing feature.
first_indexed 2024-04-09T23:12:18Z
format Article
id doaj.art-bc961722e0504cf0a0b1ec34b6673eb1
institution Directory Open Access Journal
issn 2662-4745
2662-4753
language English
last_indexed 2024-04-09T23:12:18Z
publishDate 2022-09-01
publisher SpringerOpen
record_format Article
series Railway Engineering Science
spelling doaj.art-bc961722e0504cf0a0b1ec34b6673eb12023-03-22T10:20:37ZengSpringerOpenRailway Engineering Science2662-47452662-47532022-09-013118910610.1007/s40534-022-00286-xPrediction of high-speed train delay propagation based on causal text informationQianyi Liu0Shengjie Wang1Zhongcan Li2Li Li3Jun Zhang4Chao Wen5National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong UniversityNational Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong UniversityNational Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong UniversityNational Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong UniversityChina Railway Chengdu Group Co., LtdNational Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong UniversityAbstract The delay-causing text data contain valuable information such as the specific reasons for the delay, location and time of the disturbance, which can provide an efficient support for the prediction of train delays and improve the guidance of train control efficiency. Based on the train operation data and delay-causing data of the Wuhan–Guangzhou high-speed railway, the relevant algorithms in the natural language processing field are used to process the delay-causing text data. It also integrates the  train operating-environment information and delay-causing text information so as to develop a cause-based train delay propagation prediction model. The Word2vec model is first used to vectorize the delay-causing text description after word segmentation. The mean model or the term frequency-inverse document frequency-weighted model is then used to generate the delay-causing sentence vector based on the original word vector. Afterward, the  train operating-environment features and delay-causing sentence vector are input into the extreme gradient boosting (XGBoost) regression algorithm to develop a delay propagation prediction model. In this work, 4 text feature processing methods and 8 regression algorithms are considered. The results demonstrate that the XGBoost regression algorithm has the highest prediction accuracy using the test features processed by the continuous bag of words and the mean models. Compared with the prediction model that only considers the train-operating-environment features, the results show that the prediction accuracy of the model is significantly improved with multiple regression algorithms after integrating the delay-causing feature.https://doi.org/10.1007/s40534-022-00286-xHigh-speed railDelay propagationCause of delayWord2vecNatural language processing
spellingShingle Qianyi Liu
Shengjie Wang
Zhongcan Li
Li Li
Jun Zhang
Chao Wen
Prediction of high-speed train delay propagation based on causal text information
Railway Engineering Science
High-speed rail
Delay propagation
Cause of delay
Word2vec
Natural language processing
title Prediction of high-speed train delay propagation based on causal text information
title_full Prediction of high-speed train delay propagation based on causal text information
title_fullStr Prediction of high-speed train delay propagation based on causal text information
title_full_unstemmed Prediction of high-speed train delay propagation based on causal text information
title_short Prediction of high-speed train delay propagation based on causal text information
title_sort prediction of high speed train delay propagation based on causal text information
topic High-speed rail
Delay propagation
Cause of delay
Word2vec
Natural language processing
url https://doi.org/10.1007/s40534-022-00286-x
work_keys_str_mv AT qianyiliu predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation
AT shengjiewang predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation
AT zhongcanli predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation
AT lili predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation
AT junzhang predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation
AT chaowen predictionofhighspeedtraindelaypropagationbasedoncausaltextinformation