Using Machine Learning and Feature Selection for Alfalfa Yield Prediction

Predicting alfalfa biomass and crop yield for livestock feed is important to the daily lives of virtually everyone, and many features of data from this domain combined with corresponding weather data can be used to train machine learning models for yield prediction. In this work, we used yield data...

Full description

Bibliographic Details
Main Authors: Christopher D. Whitmire, Jonathan M. Vance, Hend K. Rasheed, Ali Missaoui, Khaled M. Rasheed, Frederick W. Maier
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/2/1/6
_version_ 1797396610205876224
author Christopher D. Whitmire
Jonathan M. Vance
Hend K. Rasheed
Ali Missaoui
Khaled M. Rasheed
Frederick W. Maier
author_facet Christopher D. Whitmire
Jonathan M. Vance
Hend K. Rasheed
Ali Missaoui
Khaled M. Rasheed
Frederick W. Maier
author_sort Christopher D. Whitmire
collection DOAJ
description Predicting alfalfa biomass and crop yield for livestock feed is important to the daily lives of virtually everyone, and many features of data from this domain combined with corresponding weather data can be used to train machine learning models for yield prediction. In this work, we used yield data of different alfalfa varieties from multiple years in Kentucky and Georgia, and we compared the impact of different feature selection methods on machine learning (ML) models trained to predict alfalfa yield. Linear regression, regression trees, support vector machines, neural networks, Bayesian regression, and nearest neighbors were all developed with cross validation. The features used included weather data, historical yield data, and the sown date. The feature selection methods that were compared included a correlation-based method, the ReliefF method, and a wrapper method. We found that the best method was the correlation-based method, and the feature set it found consisted of the Julian day of the harvest, the number of days between the sown and harvest dates, cumulative solar radiation since the previous harvest, and cumulative rainfall since the previous harvest. Using these features, the k-nearest neighbor and random forest methods achieved an average R value over 0.95, and average mean absolute error less than 200 lbs./acre. Our top R<sup>2</sup> of 0.90 beats a previous work’s best R<sup>2</sup> of 0.87. Our primary contribution is the demonstration that ML, with feature selection, shows promise in predicting crop yields even on simple datasets with a handful of features, and that reporting accuracies in R and R<sup>2</sup> offers an intuitive way to compare results among various crops.
first_indexed 2024-03-09T00:52:56Z
format Article
id doaj.art-c783f4f52b2943849e33934d3fa1706a
institution Directory Open Access Journal
issn 2673-2688
language English
last_indexed 2024-03-09T00:52:56Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series AI
spelling doaj.art-c783f4f52b2943849e33934d3fa1706a2023-12-11T17:04:24ZengMDPI AGAI2673-26882021-02-0121718810.3390/ai2010006Using Machine Learning and Feature Selection for Alfalfa Yield PredictionChristopher D. Whitmire0Jonathan M. Vance1Hend K. Rasheed2Ali Missaoui3Khaled M. Rasheed4Frederick W. Maier5Institute for Artificial Intelligence, University of Georgia, 515 Boyd Graduate Studies, 200 D. W. Brooks Drive, Athens, GA 30602, USADepartment of Computer Science, University of Georgia, 415 Boyd Graduate Studies, 200 D. W. Brooks Drive, Athens, GA 30602, USAInstitute for Artificial Intelligence, University of Georgia, 515 Boyd Graduate Studies, 200 D. W. Brooks Drive, Athens, GA 30602, USADepartment of Crop and Soil Sciences, Institute of Plant Breeding Genetics and Genomics, University of Georgia, 4317 Miller Plant Science, Athens, GA 30602, USAInstitute for Artificial Intelligence, University of Georgia, 515 Boyd Graduate Studies, 200 D. W. Brooks Drive, Athens, GA 30602, USAInstitute for Artificial Intelligence, University of Georgia, 515 Boyd Graduate Studies, 200 D. W. Brooks Drive, Athens, GA 30602, USAPredicting alfalfa biomass and crop yield for livestock feed is important to the daily lives of virtually everyone, and many features of data from this domain combined with corresponding weather data can be used to train machine learning models for yield prediction. In this work, we used yield data of different alfalfa varieties from multiple years in Kentucky and Georgia, and we compared the impact of different feature selection methods on machine learning (ML) models trained to predict alfalfa yield. Linear regression, regression trees, support vector machines, neural networks, Bayesian regression, and nearest neighbors were all developed with cross validation. The features used included weather data, historical yield data, and the sown date. The feature selection methods that were compared included a correlation-based method, the ReliefF method, and a wrapper method. We found that the best method was the correlation-based method, and the feature set it found consisted of the Julian day of the harvest, the number of days between the sown and harvest dates, cumulative solar radiation since the previous harvest, and cumulative rainfall since the previous harvest. Using these features, the k-nearest neighbor and random forest methods achieved an average R value over 0.95, and average mean absolute error less than 200 lbs./acre. Our top R<sup>2</sup> of 0.90 beats a previous work’s best R<sup>2</sup> of 0.87. Our primary contribution is the demonstration that ML, with feature selection, shows promise in predicting crop yields even on simple datasets with a handful of features, and that reporting accuracies in R and R<sup>2</sup> offers an intuitive way to compare results among various crops.https://www.mdpi.com/2673-2688/2/1/6alfalfacross validationfeature selectionmachine learningregressionyield prediction
spellingShingle Christopher D. Whitmire
Jonathan M. Vance
Hend K. Rasheed
Ali Missaoui
Khaled M. Rasheed
Frederick W. Maier
Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
AI
alfalfa
cross validation
feature selection
machine learning
regression
yield prediction
title Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
title_full Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
title_fullStr Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
title_full_unstemmed Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
title_short Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
title_sort using machine learning and feature selection for alfalfa yield prediction
topic alfalfa
cross validation
feature selection
machine learning
regression
yield prediction
url https://www.mdpi.com/2673-2688/2/1/6
work_keys_str_mv AT christopherdwhitmire usingmachinelearningandfeatureselectionforalfalfayieldprediction
AT jonathanmvance usingmachinelearningandfeatureselectionforalfalfayieldprediction
AT hendkrasheed usingmachinelearningandfeatureselectionforalfalfayieldprediction
AT alimissaoui usingmachinelearningandfeatureselectionforalfalfayieldprediction
AT khaledmrasheed usingmachinelearningandfeatureselectionforalfalfayieldprediction
AT frederickwmaier usingmachinelearningandfeatureselectionforalfalfayieldprediction