eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models

Abstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage...

Full description

Bibliographic Details
Main Authors: Julián Candia, John S Tsang
Format: Article
Language:English
Published: BMC 2019-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2778-5
_version_ 1811206581919940608
author Julián Candia
John S Tsang
author_facet Julián Candia
John S Tsang
author_sort Julián Candia
collection DOAJ
description Abstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. Results eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. Conclusions This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer.
first_indexed 2024-04-12T03:49:41Z
format Article
id doaj.art-55c0b1f0bb1e4071ab454d748f8002fd
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T03:49:41Z
publishDate 2019-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-55c0b1f0bb1e4071ab454d748f8002fd2022-12-22T03:49:01ZengBMCBMC Bioinformatics1471-21052019-04-0120111110.1186/s12859-019-2778-5eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear modelsJulián Candia0John S Tsang1Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of HealthTrans-NIH Center for Human Immunology (CHI), National Institute of Allergy and Infectious Diseases, National Institutes of HealthAbstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. Results eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. Conclusions This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer.http://link.springer.com/article/10.1186/s12859-019-2778-5SoftwareR packageGeneralized linear modelsRegressionClassificationRegularization
spellingShingle Julián Candia
John S Tsang
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
BMC Bioinformatics
Software
R package
Generalized linear models
Regression
Classification
Regularization
title eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_full eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_fullStr eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_full_unstemmed eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_short eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_sort enetxplorer an r package for the quantitative exploration of elastic net families for generalized linear models
topic Software
R package
Generalized linear models
Regression
Classification
Regularization
url http://link.springer.com/article/10.1186/s12859-019-2778-5
work_keys_str_mv AT juliancandia enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels
AT johnstsang enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels