Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms

Peanut is a critical food crop worldwide, and the development of high-throughput phenotyping techniques is essential for enhancing the crop’s genetic gain rate. Given the obvious challenges of directly estimating peanut yields through remote sensing, an approach that utilizes above-ground phenotypes...

Full description

Bibliographic Details
Main Authors: N. Ace Pugh, Andrew Young, Manisha Ojha, Yves Emendack, Jacobo Sanchez, Zhanguo Xin, Naveen Puppala
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-02-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2024.1339864/full
_version_ 1827346845969416192
author N. Ace Pugh
Andrew Young
Manisha Ojha
Yves Emendack
Jacobo Sanchez
Zhanguo Xin
Naveen Puppala
author_facet N. Ace Pugh
Andrew Young
Manisha Ojha
Yves Emendack
Jacobo Sanchez
Zhanguo Xin
Naveen Puppala
author_sort N. Ace Pugh
collection DOAJ
description Peanut is a critical food crop worldwide, and the development of high-throughput phenotyping techniques is essential for enhancing the crop’s genetic gain rate. Given the obvious challenges of directly estimating peanut yields through remote sensing, an approach that utilizes above-ground phenotypes to estimate underground yield is necessary. To that end, this study leveraged unmanned aerial vehicles (UAVs) for high-throughput phenotyping of surface traits in peanut. Using a diverse set of peanut germplasm planted in 2021 and 2022, UAV flight missions were repeatedly conducted to capture image data that were used to construct high-resolution multitemporal sigmoidal growth curves based on apparent characteristics, such as canopy cover and canopy height. Latent phenotypes extracted from these growth curves and their first derivatives informed the development of advanced machine learning models, specifically random forest and eXtreme Gradient Boosting (XGBoost), to estimate yield in the peanut plots. The random forest model exhibited exceptional predictive accuracy (R2 = 0.93), while XGBoost was also reasonably effective (R2 = 0.88). When using confusion matrices to evaluate the classification abilities of each model, the two models proved valuable in a breeding pipeline, particularly for filtering out underperforming genotypes. In addition, the random forest model excelled in identifying top-performing material while minimizing Type I and Type II errors. Overall, these findings underscore the potential of machine learning models, especially random forests and XGBoost, in predicting peanut yield and improving the efficiency of peanut breeding programs.
first_indexed 2024-03-07T23:38:36Z
format Article
id doaj.art-b496b5e132574d47851fb14d52014ab2
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-03-07T23:38:36Z
publishDate 2024-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-b496b5e132574d47851fb14d52014ab22024-02-20T04:29:20ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2024-02-011510.3389/fpls.2024.13398641339864Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithmsN. Ace Pugh0Andrew Young1Manisha Ojha2Yves Emendack3Jacobo Sanchez4Zhanguo Xin5Naveen Puppala6United States Department of Agriculture, Crop Stress Research Laboratory, Lubbock, TX, United StatesUnited States Department of Agriculture, Crop Stress Research Laboratory, Lubbock, TX, United StatesAgricultural Science Center at Clovis, New Mexico State University, Clovis, NM, United StatesUnited States Department of Agriculture, Crop Stress Research Laboratory, Lubbock, TX, United StatesUnited States Department of Agriculture, Crop Stress Research Laboratory, Lubbock, TX, United StatesUnited States Department of Agriculture, Crop Stress Research Laboratory, Lubbock, TX, United StatesAgricultural Science Center at Clovis, New Mexico State University, Clovis, NM, United StatesPeanut is a critical food crop worldwide, and the development of high-throughput phenotyping techniques is essential for enhancing the crop’s genetic gain rate. Given the obvious challenges of directly estimating peanut yields through remote sensing, an approach that utilizes above-ground phenotypes to estimate underground yield is necessary. To that end, this study leveraged unmanned aerial vehicles (UAVs) for high-throughput phenotyping of surface traits in peanut. Using a diverse set of peanut germplasm planted in 2021 and 2022, UAV flight missions were repeatedly conducted to capture image data that were used to construct high-resolution multitemporal sigmoidal growth curves based on apparent characteristics, such as canopy cover and canopy height. Latent phenotypes extracted from these growth curves and their first derivatives informed the development of advanced machine learning models, specifically random forest and eXtreme Gradient Boosting (XGBoost), to estimate yield in the peanut plots. The random forest model exhibited exceptional predictive accuracy (R2 = 0.93), while XGBoost was also reasonably effective (R2 = 0.88). When using confusion matrices to evaluate the classification abilities of each model, the two models proved valuable in a breeding pipeline, particularly for filtering out underperforming genotypes. In addition, the random forest model excelled in identifying top-performing material while minimizing Type I and Type II errors. Overall, these findings underscore the potential of machine learning models, especially random forests and XGBoost, in predicting peanut yield and improving the efficiency of peanut breeding programs.https://www.frontiersin.org/articles/10.3389/fpls.2024.1339864/fullartificial intelligencecrop yieldgrowth curvesmachine learningpeanutplant breeding
spellingShingle N. Ace Pugh
Andrew Young
Manisha Ojha
Yves Emendack
Jacobo Sanchez
Zhanguo Xin
Naveen Puppala
Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
Frontiers in Plant Science
artificial intelligence
crop yield
growth curves
machine learning
peanut
plant breeding
title Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
title_full Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
title_fullStr Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
title_full_unstemmed Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
title_short Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
title_sort yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
topic artificial intelligence
crop yield
growth curves
machine learning
peanut
plant breeding
url https://www.frontiersin.org/articles/10.3389/fpls.2024.1339864/full
work_keys_str_mv AT nacepugh yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT andrewyoung yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT manishaojha yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT yvesemendack yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT jacobosanchez yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT zhanguoxin yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms
AT naveenpuppala yieldpredictioninapeanutbreedingprogramusingremotesensingdataandmachinelearningalgorithms