Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining indivi...

Full description

Bibliographic Details
Main Authors: Dongdong eLin, Jigang eZhang, Jingyao eLi, Hao eHe, hong-wen eDeng, Yu-Ping eWang
Format: Article
Language:English
Published: Frontiers Media S.A. 2014-10-01
Series:Frontiers in Cell and Developmental Biology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fcell.2014.00062/full
_version_ 1818561934670168064
author Dongdong eLin
Dongdong eLin
Jigang eZhang
Jigang eZhang
Jingyao eLi
Jingyao eLi
Hao eHe
Hao eHe
hong-wen eDeng
hong-wen eDeng
Yu-Ping eWang
Yu-Ping eWang
Yu-Ping eWang
author_facet Dongdong eLin
Dongdong eLin
Jigang eZhang
Jigang eZhang
Jingyao eLi
Jingyao eLi
Hao eHe
Hao eHe
hong-wen eDeng
hong-wen eDeng
Yu-Ping eWang
Yu-Ping eWang
Yu-Ping eWang
author_sort Dongdong eLin
collection DOAJ
description A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2) group variables from all studies for identifying significant genes; 3) enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
first_indexed 2024-12-14T00:57:12Z
format Article
id doaj.art-31082346db13453881fa98555eafb71b
institution Directory Open Access Journal
issn 2296-634X
language English
last_indexed 2024-12-14T00:57:12Z
publishDate 2014-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Cell and Developmental Biology
spelling doaj.art-31082346db13453881fa98555eafb71b2022-12-21T23:23:29ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2014-10-01210.3389/fcell.2014.00062105323Integrative analysis of multiple diverse omics datasets by sparse group multitask regressionDongdong eLin0Dongdong eLin1Jigang eZhang2Jigang eZhang3Jingyao eLi4Jingyao eLi5Hao eHe6Hao eHe7hong-wen eDeng8hong-wen eDeng9Yu-Ping eWang10Yu-Ping eWang11Yu-Ping eWang12Tulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityTulane UniversityA variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2) group variables from all studies for identifying significant genes; 3) enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.http://journal.frontiersin.org/Journal/10.3389/fcell.2014.00062/fullOsteoporosisGroup Lasso: Sparse regressionmultitask learningsignificant test
spellingShingle Dongdong eLin
Dongdong eLin
Jigang eZhang
Jigang eZhang
Jingyao eLi
Jingyao eLi
Hao eHe
Hao eHe
hong-wen eDeng
hong-wen eDeng
Yu-Ping eWang
Yu-Ping eWang
Yu-Ping eWang
Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
Frontiers in Cell and Developmental Biology
Osteoporosis
Group Lasso
: Sparse regression
multitask learning
significant test
title Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_full Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_fullStr Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_full_unstemmed Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_short Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_sort integrative analysis of multiple diverse omics datasets by sparse group multitask regression
topic Osteoporosis
Group Lasso
: Sparse regression
multitask learning
significant test
url http://journal.frontiersin.org/Journal/10.3389/fcell.2014.00062/full
work_keys_str_mv AT dongdongelin integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT dongdongelin integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT jigangezhang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT jigangezhang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT jingyaoeli integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT jingyaoeli integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT haoehe integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT haoehe integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT hongwenedeng integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT hongwenedeng integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT yupingewang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT yupingewang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT yupingewang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression