Imputation Analysis of Time-Series Data Using a Random Forest Algorithm

Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissFor...

Full description

Bibliographic Details
Main Authors: Nur Najmiyah, Jaafar, Muhammad Nur Ajmal, Rosdi, Khairur Rijal, Jamaludin, Faizir, Ramlie, Habibah, Abdul Talib
Format: Conference or Workshop Item
Language:English
English
Published: Springer Singapore 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf
http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf
Description
Summary:Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data.