Study on the performance of Robust LASSO in determining important variables data with outliers
A variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology. However, as several authors have noted, LASSO is sensitive to outliers in the data. For this reason, the Robust-LASSO approach was introduced by applying...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Syiah Kuala, Faculty of Mathematics and Natural Science
2023-03-01
|
Series: | Jurnal Natural |
Subjects: | |
Online Access: | https://jurnal.usk.ac.id/natural/article/view/26279 |
_version_ | 1797226789019320320 |
---|---|
author | ROCHYATI ROCHYATI KUSMAN SADIK BAGUS SARTONO EVITA PURNANINGRUM |
author_facet | ROCHYATI ROCHYATI KUSMAN SADIK BAGUS SARTONO EVITA PURNANINGRUM |
author_sort | ROCHYATI ROCHYATI |
collection | DOAJ |
description | A variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology. However, as several authors have noted, LASSO is sensitive to outliers in the data. For this reason, the Robust-LASSO approach was introduced by applying some weighting schemes for each sample in the data. This research presented a comparative study of the three weighting schemes in Robust LASSO, namely Huber-LASSO, Tukey-LASSO, and Welsch-LASSO. The study did a rich simulation containing many scenarios with various characteristics on the covariance structures of the explanatory variable, the types of outliers, the number of outliers, the location of active variables, and the number of variables. The study then found that Tukey-LASSO outperformed Huber-LASSO and Welsch-LASSO in identifying significant variables. The Robust LASSO performance generally decreased as the covariances among explanatory variables increased and the data dimension increased. Exploration of sembung leaf extract data shows that the data is high dimensional data which contains outliers of about 14,28% on the response variable and about 25,71% on the explanatory variables. Based on the research, the number of variables selected using the Tukey-LASSO method was nine compounds, Huber-LASSO and Welsch-LASSO were eight compounds, and LASSO 13 compounds. The Tukey-LASSO prediction accuracy is superior to the other three methods. |
first_indexed | 2024-04-24T14:30:29Z |
format | Article |
id | doaj.art-ddef5e278b2443a4a8714515d5156381 |
institution | Directory Open Access Journal |
issn | 1411-8513 2541-4062 |
language | English |
last_indexed | 2024-04-24T14:30:29Z |
publishDate | 2023-03-01 |
publisher | Universitas Syiah Kuala, Faculty of Mathematics and Natural Science |
record_format | Article |
series | Jurnal Natural |
spelling | doaj.art-ddef5e278b2443a4a8714515d51563812024-04-03T03:15:56ZengUniversitas Syiah Kuala, Faculty of Mathematics and Natural ScienceJurnal Natural1411-85132541-40622023-03-0123191510.24815/jn.v23i1.2627915637Study on the performance of Robust LASSO in determining important variables data with outliersROCHYATI ROCHYATI0KUSMAN SADIK1BAGUS SARTONO2EVITA PURNANINGRUM3Department of Statistics, IPB University, Bogor, IndonesiaDepartment of Statistics, IPB University, Bogor, IndonesiaDepartment of Statistics, IPB University, Bogor, IndonesiaDepartment of Management, PGRI Adi Buana University, Surabaya, IndonesiaA variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology. However, as several authors have noted, LASSO is sensitive to outliers in the data. For this reason, the Robust-LASSO approach was introduced by applying some weighting schemes for each sample in the data. This research presented a comparative study of the three weighting schemes in Robust LASSO, namely Huber-LASSO, Tukey-LASSO, and Welsch-LASSO. The study did a rich simulation containing many scenarios with various characteristics on the covariance structures of the explanatory variable, the types of outliers, the number of outliers, the location of active variables, and the number of variables. The study then found that Tukey-LASSO outperformed Huber-LASSO and Welsch-LASSO in identifying significant variables. The Robust LASSO performance generally decreased as the covariances among explanatory variables increased and the data dimension increased. Exploration of sembung leaf extract data shows that the data is high dimensional data which contains outliers of about 14,28% on the response variable and about 25,71% on the explanatory variables. Based on the research, the number of variables selected using the Tukey-LASSO method was nine compounds, Huber-LASSO and Welsch-LASSO were eight compounds, and LASSO 13 compounds. The Tukey-LASSO prediction accuracy is superior to the other three methods.https://jurnal.usk.ac.id/natural/article/view/26279high dimensional regression, huber, tukey, variable selection, welsch |
spellingShingle | ROCHYATI ROCHYATI KUSMAN SADIK BAGUS SARTONO EVITA PURNANINGRUM Study on the performance of Robust LASSO in determining important variables data with outliers Jurnal Natural high dimensional regression, huber, tukey, variable selection, welsch |
title | Study on the performance of Robust LASSO in determining important variables data with outliers |
title_full | Study on the performance of Robust LASSO in determining important variables data with outliers |
title_fullStr | Study on the performance of Robust LASSO in determining important variables data with outliers |
title_full_unstemmed | Study on the performance of Robust LASSO in determining important variables data with outliers |
title_short | Study on the performance of Robust LASSO in determining important variables data with outliers |
title_sort | study on the performance of robust lasso in determining important variables data with outliers |
topic | high dimensional regression, huber, tukey, variable selection, welsch |
url | https://jurnal.usk.ac.id/natural/article/view/26279 |
work_keys_str_mv | AT rochyatirochyati studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers AT kusmansadik studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers AT bagussartono studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers AT evitapurnaningrum studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers |