Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissFor...
Main Authors: | , , , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
Springer Singapore
2024
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf |
Summary: | Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data. |
---|