A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to comple...

Full description

Bibliographic Details
Main Authors:	Maura John, Florian Haselbeck, Rupashree Dass, Christoph Malisi, Patrizia Ricca, Christian Dreischer, Sebastian J. Schultheiss, Dominik G. Grimm
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2022-11-01
Series:	Frontiers in Plant Science
Subjects:	phenotype prediction genomic selection plant phenotyping machine learning Arabidopsis thaliana
Online Access:	https://www.frontiersin.org/articles/10.3389/fpls.2022.932512/full

_version_	1797990012011151360
author	Maura John Maura John Florian Haselbeck Florian Haselbeck Rupashree Dass Christoph Malisi Patrizia Ricca Christian Dreischer Sebastian J. Schultheiss Dominik G. Grimm Dominik G. Grimm Dominik G. Grimm
author_facet	Maura John Maura John Florian Haselbeck Florian Haselbeck Rupashree Dass Christoph Malisi Patrizia Ricca Christian Dreischer Sebastian J. Schultheiss Dominik G. Grimm Dominik G. Grimm Dominik G. Grimm
author_sort	Maura John
collection	DOAJ
description	Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
first_indexed	2024-04-11T08:28:39Z
format	Article
id	doaj.art-1989c900845642afbd2043b4c21e4580
institution	Directory Open Access Journal
issn	1664-462X
language	English
last_indexed	2024-04-11T08:28:39Z
publishDate	2022-11-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Plant Science
spelling	doaj.art-1989c900845642afbd2043b4c21e45802022-12-22T04:34:36ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2022-11-011310.3389/fpls.2022.932512932512A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant speciesMaura John0Maura John1Florian Haselbeck2Florian Haselbeck3Rupashree Dass4Christoph Malisi5Patrizia Ricca6Christian Dreischer7Sebastian J. Schultheiss8Dominik G. Grimm9Dominik G. Grimm10Dominik G. Grimm11Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, GermanyWeihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, GermanyTechnical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, GermanyWeihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, GermanyComputomics GmbH, Tübingen, GermanyComputomics GmbH, Tübingen, GermanyComputomics GmbH, Tübingen, GermanyComputomics GmbH, Tübingen, GermanyComputomics GmbH, Tübingen, GermanyTechnical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, GermanyWeihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, GermanyTechnical University of Munich, Department of Informatics, Garching, GermanyGenomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.https://www.frontiersin.org/articles/10.3389/fpls.2022.932512/fullphenotype predictiongenomic selectionplant phenotypingmachine learningArabidopsis thaliana
spellingShingle	Maura John Maura John Florian Haselbeck Florian Haselbeck Rupashree Dass Christoph Malisi Patrizia Ricca Christian Dreischer Sebastian J. Schultheiss Dominik G. Grimm Dominik G. Grimm Dominik G. Grimm A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species Frontiers in Plant Science phenotype prediction genomic selection plant phenotyping machine learning Arabidopsis thaliana
title	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_full	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_fullStr	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_full_unstemmed	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_short	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_sort	comparison of classical and machine learning based phenotype prediction methods on simulated data and three plant species
topic	phenotype prediction genomic selection plant phenotyping machine learning Arabidopsis thaliana
url	https://www.frontiersin.org/articles/10.3389/fpls.2022.932512/full
work_keys_str_mv	AT maurajohn acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT maurajohn acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT florianhaselbeck acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT florianhaselbeck acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT rupashreedass acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT christophmalisi acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT patriziaricca acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT christiandreischer acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT sebastianjschultheiss acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT maurajohn comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT maurajohn comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT florianhaselbeck comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT florianhaselbeck comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT rupashreedass comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT christophmalisi comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT patriziaricca comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT christiandreischer comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT sebastianjschultheiss comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dominikggrimm comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

Similar Items