Study on the performance of Robust LASSO in determining important variables data with outliers

A variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology.  However, as several authors have noted, LASSO is sensitive to outliers in the data.  For this reason, the Robust-LASSO approach was introduced by applying...

Full description

Bibliographic Details
Main Authors: ROCHYATI ROCHYATI, KUSMAN SADIK, BAGUS SARTONO, EVITA PURNANINGRUM
Format: Article
Language:English
Published: Universitas Syiah Kuala, Faculty of Mathematics and Natural Science 2023-03-01
Series:Jurnal Natural
Subjects:
Online Access:https://jurnal.usk.ac.id/natural/article/view/26279
_version_ 1797226789019320320
author ROCHYATI ROCHYATI
KUSMAN SADIK
BAGUS SARTONO
EVITA PURNANINGRUM
author_facet ROCHYATI ROCHYATI
KUSMAN SADIK
BAGUS SARTONO
EVITA PURNANINGRUM
author_sort ROCHYATI ROCHYATI
collection DOAJ
description A variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology.  However, as several authors have noted, LASSO is sensitive to outliers in the data.  For this reason, the Robust-LASSO approach was introduced by applying some weighting schemes for each sample in the data.  This research presented a comparative study of the three weighting schemes in Robust LASSO, namely Huber-LASSO, Tukey-LASSO, and Welsch-LASSO.  The study did a rich simulation containing many scenarios with various characteristics on the covariance structures of the explanatory variable, the types of outliers, the number of outliers, the location of active variables, and the number of variables.  The study then found that Tukey-LASSO outperformed Huber-LASSO and Welsch-LASSO in identifying significant variables.  The Robust LASSO performance generally decreased as the covariances among explanatory variables increased and the data dimension increased.  Exploration of sembung leaf extract data shows that the data is high dimensional data which contains outliers of about 14,28% on the response variable and about 25,71% on the explanatory variables.  Based on the research, the number of variables selected using the Tukey-LASSO method was nine compounds, Huber-LASSO and Welsch-LASSO were eight compounds, and LASSO 13 compounds.  The Tukey-LASSO prediction accuracy is superior to the other three methods.
first_indexed 2024-04-24T14:30:29Z
format Article
id doaj.art-ddef5e278b2443a4a8714515d5156381
institution Directory Open Access Journal
issn 1411-8513
2541-4062
language English
last_indexed 2024-04-24T14:30:29Z
publishDate 2023-03-01
publisher Universitas Syiah Kuala, Faculty of Mathematics and Natural Science
record_format Article
series Jurnal Natural
spelling doaj.art-ddef5e278b2443a4a8714515d51563812024-04-03T03:15:56ZengUniversitas Syiah Kuala, Faculty of Mathematics and Natural ScienceJurnal Natural1411-85132541-40622023-03-0123191510.24815/jn.v23i1.2627915637Study on the performance of Robust LASSO in determining important variables data with outliersROCHYATI ROCHYATI0KUSMAN SADIK1BAGUS SARTONO2EVITA PURNANINGRUM3Department of Statistics, IPB University, Bogor, IndonesiaDepartment of Statistics, IPB University, Bogor, IndonesiaDepartment of Statistics, IPB University, Bogor, IndonesiaDepartment of Management, PGRI Adi Buana University, Surabaya, IndonesiaA variable selection method is required to deal with regression models with many variables, and LASSO has been the most widely used methodology.  However, as several authors have noted, LASSO is sensitive to outliers in the data.  For this reason, the Robust-LASSO approach was introduced by applying some weighting schemes for each sample in the data.  This research presented a comparative study of the three weighting schemes in Robust LASSO, namely Huber-LASSO, Tukey-LASSO, and Welsch-LASSO.  The study did a rich simulation containing many scenarios with various characteristics on the covariance structures of the explanatory variable, the types of outliers, the number of outliers, the location of active variables, and the number of variables.  The study then found that Tukey-LASSO outperformed Huber-LASSO and Welsch-LASSO in identifying significant variables.  The Robust LASSO performance generally decreased as the covariances among explanatory variables increased and the data dimension increased.  Exploration of sembung leaf extract data shows that the data is high dimensional data which contains outliers of about 14,28% on the response variable and about 25,71% on the explanatory variables.  Based on the research, the number of variables selected using the Tukey-LASSO method was nine compounds, Huber-LASSO and Welsch-LASSO were eight compounds, and LASSO 13 compounds.  The Tukey-LASSO prediction accuracy is superior to the other three methods.https://jurnal.usk.ac.id/natural/article/view/26279high dimensional regression, huber, tukey, variable selection, welsch
spellingShingle ROCHYATI ROCHYATI
KUSMAN SADIK
BAGUS SARTONO
EVITA PURNANINGRUM
Study on the performance of Robust LASSO in determining important variables data with outliers
Jurnal Natural
high dimensional regression, huber, tukey, variable selection, welsch
title Study on the performance of Robust LASSO in determining important variables data with outliers
title_full Study on the performance of Robust LASSO in determining important variables data with outliers
title_fullStr Study on the performance of Robust LASSO in determining important variables data with outliers
title_full_unstemmed Study on the performance of Robust LASSO in determining important variables data with outliers
title_short Study on the performance of Robust LASSO in determining important variables data with outliers
title_sort study on the performance of robust lasso in determining important variables data with outliers
topic high dimensional regression, huber, tukey, variable selection, welsch
url https://jurnal.usk.ac.id/natural/article/view/26279
work_keys_str_mv AT rochyatirochyati studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers
AT kusmansadik studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers
AT bagussartono studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers
AT evitapurnaningrum studyontheperformanceofrobustlassoindeterminingimportantvariablesdatawithoutliers