Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge

A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytopht...

Full description

Bibliographic Details
Main Authors: Loh, Po-Ru, Tucker, George Jay, Berger, Bonnie
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Public Library of Science 2012
Online Access:http://hdl.handle.net/1721.1/69039
https://orcid.org/0000-0002-2724-7228
_version_ 1811075352267587584
author Loh, Po-Ru
Tucker, George Jay
Berger, Bonnie
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Loh, Po-Ru
Tucker, George Jay
Berger, Bonnie
author_sort Loh, Po-Ru
collection MIT
description A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.
first_indexed 2024-09-23T10:04:22Z
format Article
id mit-1721.1/69039
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T10:04:22Z
publishDate 2012
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/690392022-09-26T15:32:25Z Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge Loh, Po-Ru Tucker, George Jay Berger, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Mathematics Berger Leighton, Bonnie Loh, Po-Ru Tucker, George Jay Berger, Bonnie A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets. National Science Foundation (U.S.). Graduate Research Fellowship Program National Defense Science and Engineering Graduate Fellowship 2012-02-08T17:16:30Z 2012-02-08T17:16:30Z 2011-12 2011-04 Article http://purl.org/eprint/type/JournalArticle 1932-6203 http://hdl.handle.net/1721.1/69039 Loh, Po-Ru, George Tucker, and Bonnie Berger. “Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge.” Ed. Mark Isalan. PLoS ONE 6.12 (2011): e29095. Web. 8 Feb. 2012. https://orcid.org/0000-0002-2724-7228 en_US http://dx.doi.org/10.1371/journal.pone.0029095 PLoS ONE Creative Commons Attribution http://creativecommons.org/licenses/by/2.5/ application/pdf Public Library of Science PLoS
spellingShingle Loh, Po-Ru
Tucker, George Jay
Berger, Bonnie
Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_full Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_fullStr Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_full_unstemmed Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_short Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_sort phenotype prediction using regularized regression on genetic data in the dream5 systems genetics b challenge
url http://hdl.handle.net/1721.1/69039
https://orcid.org/0000-0002-2724-7228
work_keys_str_mv AT lohporu phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge
AT tuckergeorgejay phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge
AT bergerbonnie phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge