Cleansing of inconsistent sample in linear regression model based on rough sets theory

The linear regression model is one of the most common and easiest algorithms used in machine learning for predictive analysis purposes. However, this model performs well under strict assumptions such as the number of observations, the linearity of variables, multicollinearity, homoskedasticity, reli...

Full description

Bibliographic Details
Main Authors: Rasyidah, Rasyidah, Efend, Riswan, Mohd. Nawi, Nazri, Mat Derisf, Mustafa, S.M.Aqil Burney, S.M.Aqil Burney
Format: Article
Language:English
Published: Elsevier 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/8784/1/J15770_be8201f8b61aa5ded65f238b957b8cf5.pdf
_version_ 1796869834079731712
author Rasyidah, Rasyidah
Efend, Riswan
Mohd. Nawi, Nazri
Mat Derisf, Mustafa
S.M.Aqil Burney, S.M.Aqil Burney
author_facet Rasyidah, Rasyidah
Efend, Riswan
Mohd. Nawi, Nazri
Mat Derisf, Mustafa
S.M.Aqil Burney, S.M.Aqil Burney
author_sort Rasyidah, Rasyidah
collection UTHM
description The linear regression model is one of the most common and easiest algorithms used in machine learning for predictive analysis purposes. However, this model performs well under strict assumptions such as the number of observations, the linearity of variables, multicollinearity, homoskedasticity, reliability of measurement, and normality. Besides, there is no consideration to date for handling and cleansing inconsistent samples in the data sets. These samples may significantly influence the performance of multiple linear regression in terms of these assumptions and several aspects, such as adjusted R-square, intercept-slopes, exogenous variables, and the accuracy of prediction. In this paper, the data reduction strategy of rough sets was employed to remove and clean these types of samples, boosting the performance of the linear regression models. This strategy was evaluated by examining three different effects; adjusted R-square, slopes-intercepts, and mean square error of the regression model. Simulated data and simple modeling problems were used to determine the effects of these three aspects. The secondary data sets were collected from various domains to examine the proposed rough-regression model. The simulation results showed that the data reduction strategy is exceedingly effective to boost the performance of the multiple linear regression in the three aspects above. In the implementation, these aspects also performed better than before data reduction. The results from both simulations and implementations demonstrate that the data reduction of rough sets is a viable strategy in cleansing of the inconsistent samples in the linear regression models. Thus, the proposed rough regression model is effectively proven to support the data analysis of surveys or cross-sectional studies, especially when the stated aspects are not well fulfilled. Therefore, the surveys are not needed to be repeated and reconsidered by researchers.
first_indexed 2024-03-05T22:00:40Z
format Article
id uthm.eprints-8784
institution Universiti Tun Hussein Onn Malaysia
language English
last_indexed 2024-03-05T22:00:40Z
publishDate 2023
publisher Elsevier
record_format dspace
spelling uthm.eprints-87842023-06-12T07:22:48Z http://eprints.uthm.edu.my/8784/ Cleansing of inconsistent sample in linear regression model based on rough sets theory Rasyidah, Rasyidah Efend, Riswan Mohd. Nawi, Nazri Mat Derisf, Mustafa S.M.Aqil Burney, S.M.Aqil Burney T Technology (General) The linear regression model is one of the most common and easiest algorithms used in machine learning for predictive analysis purposes. However, this model performs well under strict assumptions such as the number of observations, the linearity of variables, multicollinearity, homoskedasticity, reliability of measurement, and normality. Besides, there is no consideration to date for handling and cleansing inconsistent samples in the data sets. These samples may significantly influence the performance of multiple linear regression in terms of these assumptions and several aspects, such as adjusted R-square, intercept-slopes, exogenous variables, and the accuracy of prediction. In this paper, the data reduction strategy of rough sets was employed to remove and clean these types of samples, boosting the performance of the linear regression models. This strategy was evaluated by examining three different effects; adjusted R-square, slopes-intercepts, and mean square error of the regression model. Simulated data and simple modeling problems were used to determine the effects of these three aspects. The secondary data sets were collected from various domains to examine the proposed rough-regression model. The simulation results showed that the data reduction strategy is exceedingly effective to boost the performance of the multiple linear regression in the three aspects above. In the implementation, these aspects also performed better than before data reduction. The results from both simulations and implementations demonstrate that the data reduction of rough sets is a viable strategy in cleansing of the inconsistent samples in the linear regression models. Thus, the proposed rough regression model is effectively proven to support the data analysis of surveys or cross-sectional studies, especially when the stated aspects are not well fulfilled. Therefore, the surveys are not needed to be repeated and reconsidered by researchers. Elsevier 2023 Article PeerReviewed text en http://eprints.uthm.edu.my/8784/1/J15770_be8201f8b61aa5ded65f238b957b8cf5.pdf Rasyidah, Rasyidah and Efend, Riswan and Mohd. Nawi, Nazri and Mat Derisf, Mustafa and S.M.Aqil Burney, S.M.Aqil Burney (2023) Cleansing of inconsistent sample in linear regression model based on rough sets theory. Systems and Soft Computing, 5. pp. 1-14. https://doi.org/10.1016/j.sasc.2022.200046
spellingShingle T Technology (General)
Rasyidah, Rasyidah
Efend, Riswan
Mohd. Nawi, Nazri
Mat Derisf, Mustafa
S.M.Aqil Burney, S.M.Aqil Burney
Cleansing of inconsistent sample in linear regression model based on rough sets theory
title Cleansing of inconsistent sample in linear regression model based on rough sets theory
title_full Cleansing of inconsistent sample in linear regression model based on rough sets theory
title_fullStr Cleansing of inconsistent sample in linear regression model based on rough sets theory
title_full_unstemmed Cleansing of inconsistent sample in linear regression model based on rough sets theory
title_short Cleansing of inconsistent sample in linear regression model based on rough sets theory
title_sort cleansing of inconsistent sample in linear regression model based on rough sets theory
topic T Technology (General)
url http://eprints.uthm.edu.my/8784/1/J15770_be8201f8b61aa5ded65f238b957b8cf5.pdf
work_keys_str_mv AT rasyidahrasyidah cleansingofinconsistentsampleinlinearregressionmodelbasedonroughsetstheory
AT efendriswan cleansingofinconsistentsampleinlinearregressionmodelbasedonroughsetstheory
AT mohdnawinazri cleansingofinconsistentsampleinlinearregressionmodelbasedonroughsetstheory
AT matderisfmustafa cleansingofinconsistentsampleinlinearregressionmodelbasedonroughsetstheory
AT smaqilburneysmaqilburney cleansingofinconsistentsampleinlinearregressionmodelbasedonroughsetstheory