Imputation Analysis of Time-Series Data Using a Random Forest Algorithm

Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissFor...

Full description

Bibliographic Details
Main Authors: Nur Najmiyah, Jaafar, Muhammad Nur Ajmal, Rosdi, Khairur Rijal, Jamaludin, Faizir, Ramlie, Habibah, Abdul Talib
Format: Conference or Workshop Item
Language:English
English
Published: Springer Singapore 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf
http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf
_version_ 1825815628284428288
author Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
author_facet Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
author_sort Nur Najmiyah, Jaafar
collection UMP
description Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data.
first_indexed 2024-09-25T03:49:04Z
format Conference or Workshop Item
id UMPir41147
institution Universiti Malaysia Pahang
language English
English
last_indexed 2024-09-25T03:49:04Z
publishDate 2024
publisher Springer Singapore
record_format dspace
spelling UMPir411472024-05-16T04:24:57Z http://umpir.ump.edu.my/id/eprint/41147/ Imputation Analysis of Time-Series Data Using a Random Forest Algorithm Nur Najmiyah, Jaafar Muhammad Nur Ajmal, Rosdi Khairur Rijal, Jamaludin Faizir, Ramlie Habibah, Abdul Talib TS Manufactures Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data. Springer Singapore 2024 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf pdf en http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf Nur Najmiyah, Jaafar and Muhammad Nur Ajmal, Rosdi and Khairur Rijal, Jamaludin and Faizir, Ramlie and Habibah, Abdul Talib (2024) Imputation Analysis of Time-Series Data Using a Random Forest Algorithm. In: Intelligent Manufacturing and Mechatronics, Lecture Notes in Networks and Systems. 4th International conference on Innovative Manufacturing, Mechatronics and Materials Forum, iM3F2023 , 07 – 08 August 2023 , Pekan, Malaysia. pp. 51-60., 850. ISSN 2367-3389 ISBN 978-981-99-8819-8 (Published) https://doi.org/10.1007/978-981-99-8819-8_4
spellingShingle TS Manufactures
Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_full Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_fullStr Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_full_unstemmed Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_short Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_sort imputation analysis of time series data using a random forest algorithm
topic TS Manufactures
url http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf
http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf
work_keys_str_mv AT nurnajmiyahjaafar imputationanalysisoftimeseriesdatausingarandomforestalgorithm
AT muhammadnurajmalrosdi imputationanalysisoftimeseriesdatausingarandomforestalgorithm
AT khairurrijaljamaludin imputationanalysisoftimeseriesdatausingarandomforestalgorithm
AT faizirramlie imputationanalysisoftimeseriesdatausingarandomforestalgorithm
AT habibahabdultalib imputationanalysisoftimeseriesdatausingarandomforestalgorithm