High dimensional model representation of log-likelihood ratio: binary classification with expression data

Abstract Background Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene inte...

Full description

Bibliographic Details
Main Authors: Ali Foroughi pour, Maciej Pietrzak, Lori A Dalton, Grzegorz A. Rempała
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3486-x
_version_ 1819104362223370240
author Ali Foroughi pour
Maciej Pietrzak
Lori A Dalton
Grzegorz A. Rempała
author_facet Ali Foroughi pour
Maciej Pietrzak
Lori A Dalton
Grzegorz A. Rempała
author_sort Ali Foroughi pour
collection DOAJ
description Abstract Background Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions. Results We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios. Conclusion The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.
first_indexed 2024-12-22T02:05:09Z
format Article
id doaj.art-1ced15e011974644812d310744bd54f6
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-22T02:05:09Z
publishDate 2020-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-1ced15e011974644812d310744bd54f62022-12-21T18:42:33ZengBMCBMC Bioinformatics1471-21052020-04-0121112710.1186/s12859-020-3486-xHigh dimensional model representation of log-likelihood ratio: binary classification with expression dataAli Foroughi pour0Maciej Pietrzak1Lori A Dalton2Grzegorz A. Rempała3Department of Electrical and Computer Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, The Ohio State UniversityDepartment of Electrical and Computer Engineering, The Ohio State UniversityDepartment of Mathematics, The Ohio State UniversityAbstract Background Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions. Results We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios. Conclusion The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.http://link.springer.com/article/10.1186/s12859-020-3486-xHigh dimensional model representationClassificationDisease predictionLog-likelihood ratioExpression analysis
spellingShingle Ali Foroughi pour
Maciej Pietrzak
Lori A Dalton
Grzegorz A. Rempała
High dimensional model representation of log-likelihood ratio: binary classification with expression data
BMC Bioinformatics
High dimensional model representation
Classification
Disease prediction
Log-likelihood ratio
Expression analysis
title High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_full High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_fullStr High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_full_unstemmed High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_short High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_sort high dimensional model representation of log likelihood ratio binary classification with expression data
topic High dimensional model representation
Classification
Disease prediction
Log-likelihood ratio
Expression analysis
url http://link.springer.com/article/10.1186/s12859-020-3486-x
work_keys_str_mv AT aliforoughipour highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata
AT maciejpietrzak highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata
AT loriadalton highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata
AT grzegorzarempała highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata