Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values

This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological...

Full description

Bibliographic Details
Main Authors: Hyun-Geoun Park, Sang-Ik Suh, Gyeong Cheol Jo, Jinuk Jang, Seo Jin Ki
Format: Article
Language:English
Published: Korean Society of Environmental Engineers 2022-12-01
Series:대한환경공학회지
Subjects:
Online Access:http://www.jksee.or.kr/upload/pdf/KSEE-2022-44-12-636.pdf
_version_ 1811164916907769856
author Hyun-Geoun Park
Sang-Ik Suh
Gyeong Cheol Jo
Jinuk Jang
Seo Jin Ki
author_facet Hyun-Geoun Park
Sang-Ik Suh
Gyeong Cheol Jo
Jinuk Jang
Seo Jin Ki
author_sort Hyun-Geoun Park
collection DOAJ
description This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.
first_indexed 2024-04-10T15:28:59Z
format Article
id doaj.art-4d6f040328954f638b46a350541fe3f5
institution Directory Open Access Journal
issn 1225-5025
2383-7810
language English
last_indexed 2024-04-10T15:28:59Z
publishDate 2022-12-01
publisher Korean Society of Environmental Engineers
record_format Article
series 대한환경공학회지
spelling doaj.art-4d6f040328954f638b46a350541fe3f52023-02-14T07:18:15ZengKorean Society of Environmental Engineers대한환경공학회지1225-50252383-78102022-12-01441263664210.4491/KSEE.2022.44.12.6364434Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing ValuesHyun-Geoun Park0Sang-Ik Suh1Gyeong Cheol Jo2Jinuk Jang3Seo Jin Ki4Department of Environmental Engineering, Gyeongsang National University, Republic of KoreaDepartment of Environmental Engineering, Gyeongsang National University, Republic of KoreaDepartment of Environmental Engineering, Gyeongsang National University, Republic of KoreaDepartment of Environmental Engineering, Gyeongsang National University, Republic of KoreaDepartment of Environmental Engineering, Gyeongsang National University, Republic of KoreaThis study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.http://www.jksee.or.kr/upload/pdf/KSEE-2022-44-12-636.pdfdeep learninginterpolation methodslong short-term memorymissing valuesmultivariate time series
spellingShingle Hyun-Geoun Park
Sang-Ik Suh
Gyeong Cheol Jo
Jinuk Jang
Seo Jin Ki
Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
대한환경공학회지
deep learning
interpolation methods
long short-term memory
missing values
multivariate time series
title Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
title_full Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
title_fullStr Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
title_full_unstemmed Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
title_short Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
title_sort assessing the performance of a long short term memory algorithm in the dataset with missing values
topic deep learning
interpolation methods
long short-term memory
missing values
multivariate time series
url http://www.jksee.or.kr/upload/pdf/KSEE-2022-44-12-636.pdf
work_keys_str_mv AT hyungeounpark assessingtheperformanceofalongshorttermmemoryalgorithminthedatasetwithmissingvalues
AT sangiksuh assessingtheperformanceofalongshorttermmemoryalgorithminthedatasetwithmissingvalues
AT gyeongcheoljo assessingtheperformanceofalongshorttermmemoryalgorithminthedatasetwithmissingvalues
AT jinukjang assessingtheperformanceofalongshorttermmemoryalgorithminthedatasetwithmissingvalues
AT seojinki assessingtheperformanceofalongshorttermmemoryalgorithminthedatasetwithmissingvalues