Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP

Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms...

Full description

Bibliographic Details
Main Authors: Philipp Georg Heilmann, Matthias Frisch, Amine Abbadi, Tobias Kox, Eva Herzog
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-07-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2023.1178902/full
_version_ 1797774998133276672
author Philipp Georg Heilmann
Matthias Frisch
Amine Abbadi
Tobias Kox
Eva Herzog
author_facet Philipp Georg Heilmann
Matthias Frisch
Amine Abbadi
Tobias Kox
Eva Herzog
author_sort Philipp Georg Heilmann
collection DOAJ
description Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.
first_indexed 2024-03-12T22:29:16Z
format Article
id doaj.art-139ab7f2df304487a5705e3b68dc8e02
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-03-12T22:29:16Z
publishDate 2023-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-139ab7f2df304487a5705e3b68dc8e022023-07-21T15:38:58ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2023-07-011410.3389/fpls.2023.11789021178902Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUPPhilipp Georg Heilmann0Matthias Frisch1Amine Abbadi2Tobias Kox3Eva Herzog4Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, GermanyInstitute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, GermanyNPZ Innovation GmbH, Holtsee, GermanyNPZ Innovation GmbH, Holtsee, GermanyInstitute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, GermanyTestcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.https://www.frontiersin.org/articles/10.3389/fpls.2023.1178902/fullmachine learningstacked ensemblesgradient boostinggenomic predictiongeneral combining abilityspecific combining ability
spellingShingle Philipp Georg Heilmann
Matthias Frisch
Amine Abbadi
Tobias Kox
Eva Herzog
Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
Frontiers in Plant Science
machine learning
stacked ensembles
gradient boosting
genomic prediction
general combining ability
specific combining ability
title Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_full Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_fullStr Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_full_unstemmed Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_short Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_sort stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker based gblup
topic machine learning
stacked ensembles
gradient boosting
genomic prediction
general combining ability
specific combining ability
url https://www.frontiersin.org/articles/10.3389/fpls.2023.1178902/full
work_keys_str_mv AT philippgeorgheilmann stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT matthiasfrisch stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT amineabbadi stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT tobiaskox stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT evaherzog stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup