Sparse sliced inverse regression for high dimensional data analysis

Abstract Background Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method,...

Full description

Bibliographic Details
Main Authors: Haileab Hilafu, Sandra E. Safo
Format: Article
Language:English
Published: BMC 2022-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04700-3
_version_ 1811306262514630656
author Haileab Hilafu
Sandra E. Safo
author_facet Haileab Hilafu
Sandra E. Safo
author_sort Haileab Hilafu
collection DOAJ
description Abstract Background Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony. Results To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature. Conclusion The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method’s effectiveness in practice.
first_indexed 2024-04-13T08:42:13Z
format Article
id doaj.art-7035e9cc87b548debfa61916f3dc2a9b
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T08:42:13Z
publishDate 2022-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-7035e9cc87b548debfa61916f3dc2a9b2022-12-22T02:53:52ZengBMCBMC Bioinformatics1471-21052022-05-0123111910.1186/s12859-022-04700-3Sparse sliced inverse regression for high dimensional data analysisHaileab Hilafu0Sandra E. Safo1Department of Business Analytics and Statistics, University of TennesseeDivision of Biostatistics, University of MinnesotaAbstract Background Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony. Results To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature. Conclusion The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method’s effectiveness in practice.https://doi.org/10.1186/s12859-022-04700-3Semiparametric modelGeneralized eigenvalue decompositionSliced inverse regressionLinear discriminant analysisHigh-dimensional data
spellingShingle Haileab Hilafu
Sandra E. Safo
Sparse sliced inverse regression for high dimensional data analysis
BMC Bioinformatics
Semiparametric model
Generalized eigenvalue decomposition
Sliced inverse regression
Linear discriminant analysis
High-dimensional data
title Sparse sliced inverse regression for high dimensional data analysis
title_full Sparse sliced inverse regression for high dimensional data analysis
title_fullStr Sparse sliced inverse regression for high dimensional data analysis
title_full_unstemmed Sparse sliced inverse regression for high dimensional data analysis
title_short Sparse sliced inverse regression for high dimensional data analysis
title_sort sparse sliced inverse regression for high dimensional data analysis
topic Semiparametric model
Generalized eigenvalue decomposition
Sliced inverse regression
Linear discriminant analysis
High-dimensional data
url https://doi.org/10.1186/s12859-022-04700-3
work_keys_str_mv AT haileabhilafu sparseslicedinverseregressionforhighdimensionaldataanalysis
AT sandraesafo sparseslicedinverseregressionforhighdimensionaldataanalysis