Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources an...

Full description

Bibliographic Details
Main Authors: Laura M. Zingaretti, Salvador Alejandro Gezan, Luis Felipe V. Ferrão, Luis F. Osorio, Amparo Monfort, Patricio R. Muñoz, Vance M. Whitaker, Miguel Pérez-Enciso
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-02-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fpls.2020.00025/full
_version_ 1818924210196578304
author Laura M. Zingaretti
Salvador Alejandro Gezan
Luis Felipe V. Ferrão
Luis F. Osorio
Amparo Monfort
Amparo Monfort
Patricio R. Muñoz
Vance M. Whitaker
Miguel Pérez-Enciso
Miguel Pérez-Enciso
author_facet Laura M. Zingaretti
Salvador Alejandro Gezan
Luis Felipe V. Ferrão
Luis F. Osorio
Amparo Monfort
Amparo Monfort
Patricio R. Muñoz
Vance M. Whitaker
Miguel Pérez-Enciso
Miguel Pérez-Enciso
author_sort Laura M. Zingaretti
collection DOAJ
description Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.
first_indexed 2024-12-20T02:21:42Z
format Article
id doaj.art-fe73c8137ccd476a8a6c0ab2bf8d1805
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-12-20T02:21:42Z
publishDate 2020-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-fe73c8137ccd476a8a6c0ab2bf8d18052022-12-21T19:56:48ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2020-02-011110.3389/fpls.2020.00025506702Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing SpeciesLaura M. Zingaretti0Salvador Alejandro Gezan1Luis Felipe V. Ferrão2Luis F. Osorio3Amparo Monfort4Amparo Monfort5Patricio R. Muñoz6Vance M. Whitaker7Miguel Pérez-Enciso8Miguel Pérez-Enciso9Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, SpainSchool of Forest Resources and Conservation, University of Florida, Gainesville, FL, United StatesBlueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United StatesIFAS Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, United StatesCentre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, SpainInstitut de Recerca i Tecnologia Agroalimentàries (IRTA), Barcelona, SpainBlueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United StatesIFAS Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, United StatesCentre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, SpainICREA, Passeig de Lluís Companys 23, Barcelona, SpainGenomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.https://www.frontiersin.org/article/10.3389/fpls.2020.00025/fullgenomic predictiongenomic selectionpolyploid speciesdeep learningepistasiscomplex traits
spellingShingle Laura M. Zingaretti
Salvador Alejandro Gezan
Luis Felipe V. Ferrão
Luis F. Osorio
Amparo Monfort
Amparo Monfort
Patricio R. Muñoz
Vance M. Whitaker
Miguel Pérez-Enciso
Miguel Pérez-Enciso
Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
Frontiers in Plant Science
genomic prediction
genomic selection
polyploid species
deep learning
epistasis
complex traits
title Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_full Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_fullStr Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_full_unstemmed Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_short Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_sort exploring deep learning for complex trait genomic prediction in polyploid outcrossing species
topic genomic prediction
genomic selection
polyploid species
deep learning
epistasis
complex traits
url https://www.frontiersin.org/article/10.3389/fpls.2020.00025/full
work_keys_str_mv AT lauramzingaretti exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT salvadoralejandrogezan exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT luisfelipevferrao exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT luisfosorio exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT amparomonfort exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT amparomonfort exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT patriciormunoz exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT vancemwhitaker exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT miguelperezenciso exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT miguelperezenciso exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies