A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Ensemble methods based on k-NN models minimise the effect of outliers in a training dataset by searching groups of the k closest data points to estimate the response of an unseen observation. However, traditional k-NN based ensemble methods use the arithmetic mean of the training points' respon...

Full description

Bibliographic Details
Main Authors:	Amjad Ali, Muhammad Hamraz, Poom Kumam, Dost Muhammad Khan, Umair Khalil, Muhammad Sulaiman, Zardad Khan
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN random <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN regression stepwise model selection ensemble learning non-informative features
Online Access:	https://ieeexplore.ieee.org/document/9143105/

_version_	1818446792813969408
author	Amjad Ali Muhammad Hamraz Poom Kumam Dost Muhammad Khan Umair Khalil Muhammad Sulaiman Zardad Khan
author_facet	Amjad Ali Muhammad Hamraz Poom Kumam Dost Muhammad Khan Umair Khalil Muhammad Sulaiman Zardad Khan
author_sort	Amjad Ali
collection	DOAJ
description	Ensemble methods based on k-NN models minimise the effect of outliers in a training dataset by searching groups of the k closest data points to estimate the response of an unseen observation. However, traditional k-NN based ensemble methods use the arithmetic mean of the training points' responses for estimation which has several weaknesses. Traditional k-NN based models are also adversely affected by the presence of non-informative features in the data. This paper suggests a novel ensemble procedure consisting of a class of base k-NN models each constructed on a bootstrap sample drawn from the training dataset with a random subset of features. In the k nearest neighbours determined by each k-NN model, stepwise regression is fitted to predict the test point. The final estimate of the target observation is then obtained by averaging the estimates from all the models in the ensemble. The proposed method is compared with some other state-of-the-art procedures on 16 benchmark datasets in terms of coefficient of determination (R<sup>2</sup>), Pearson's product-moment correlation coefficient (r), mean square predicted error (MSPE), root mean squared error (RMSE) and mean absolute error (MAE) as performance metrics. Furthermore, boxplots of the results are also constructed. The suggested ensemble procedure has outperformed the other procedures on almost all the datasets. The efficacy of the method has also been verified by assessing the proposed method in comparison with the other methods by adding non-informative features to the datasets considered. The results reveal that the proposed method is more robust to the issue of non-informative features in the data as compared to the rest of the methods.
first_indexed	2024-12-14T19:53:22Z
format	Article
id	doaj.art-f291950ca5fb4d6196b52d613cea4e95
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T19:53:22Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f291950ca5fb4d6196b52d613cea4e952022-12-21T22:49:22ZengIEEEIEEE Access2169-35362020-01-01813209513210510.1109/ACCESS.2020.30100999143105A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for RegressionAmjad Ali0Muhammad Hamraz1Poom Kumam2https://orcid.org/0000-0002-5463-4581Dost Muhammad Khan3https://orcid.org/0000-0002-3919-8136Umair Khalil4Muhammad Sulaiman5https://orcid.org/0000-0002-4040-6211Zardad Khan6https://orcid.org/0000-0003-3933-9143Department of Statistics, Abdul Wali Khan University Mardan, Mardan, PakistanDepartment of Statistics, Abdul Wali Khan University Mardan, Mardan, PakistanDepartment of Mathematics, Faculty of Science, KMUTT Fixed Point Research Laboratory, King Mongkut’s University of Technology Thonburi (KMUTT), Bangkok, ThailandDepartment of Statistics, Abdul Wali Khan University Mardan, Mardan, PakistanDepartment of Statistics, Abdul Wali Khan University Mardan, Mardan, PakistanDepartment of Mathematics, Abdul Wali Khan University Mardan, Mardan, PakistanDepartment of Statistics, Abdul Wali Khan University Mardan, Mardan, PakistanEnsemble methods based on k-NN models minimise the effect of outliers in a training dataset by searching groups of the k closest data points to estimate the response of an unseen observation. However, traditional k-NN based ensemble methods use the arithmetic mean of the training points' responses for estimation which has several weaknesses. Traditional k-NN based models are also adversely affected by the presence of non-informative features in the data. This paper suggests a novel ensemble procedure consisting of a class of base k-NN models each constructed on a bootstrap sample drawn from the training dataset with a random subset of features. In the k nearest neighbours determined by each k-NN model, stepwise regression is fitted to predict the test point. The final estimate of the target observation is then obtained by averaging the estimates from all the models in the ensemble. The proposed method is compared with some other state-of-the-art procedures on 16 benchmark datasets in terms of coefficient of determination (R<sup>2</sup>), Pearson's product-moment correlation coefficient (r), mean square predicted error (MSPE), root mean squared error (RMSE) and mean absolute error (MAE) as performance metrics. Furthermore, boxplots of the results are also constructed. The suggested ensemble procedure has outperformed the other procedures on almost all the datasets. The efficacy of the method has also been verified by assessing the proposed method in comparison with the other methods by adding non-informative features to the datasets considered. The results reveal that the proposed method is more robust to the issue of non-informative features in the data as compared to the rest of the methods.https://ieeexplore.ieee.org/document/9143105/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NNrandom <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NNregressionstepwise model selectionensemble learningnon-informative features
spellingShingle	Amjad Ali Muhammad Hamraz Poom Kumam Dost Muhammad Khan Umair Khalil Muhammad Sulaiman Zardad Khan A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression IEEE Access <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN random <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN regression stepwise model selection ensemble learning non-informative features
title	A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression
title_full	A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression
title_fullStr	A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression
title_full_unstemmed	A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression
title_short	A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression
title_sort	k nearest neighbours based ensemble via optimal model selection for regression
topic	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN random <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-NN regression stepwise model selection ensemble learning non-informative features
url	https://ieeexplore.ieee.org/document/9143105/
work_keys_str_mv	AT amjadali aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT muhammadhamraz aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT poomkumam aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT dostmuhammadkhan aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT umairkhalil aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT muhammadsulaiman aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT zardadkhan aknearestneighboursbasedensembleviaoptimalmodelselectionforregression AT amjadali knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT muhammadhamraz knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT poomkumam knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT dostmuhammadkhan knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT umairkhalil knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT muhammadsulaiman knearestneighboursbasedensembleviaoptimalmodelselectionforregression AT zardadkhan knearestneighboursbasedensembleviaoptimalmodelselectionforregression

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Similar Items