Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm

Determining the quantitative content of chlorophylls in plant leaves by their reflection spectra is an important task both in monitoring the state of natural and industrial phytocenoses, and in laboratory studies of normal and pathological processes during plant growth. The use of machine learning m...

Full description

Bibliographic Details
Main Authors: E. A. Urbanovich, D. A. Afonnikov, S. V. Nikolaev
Format: Article
Language:English
Published: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders 2021-03-01
Series:Вавиловский журнал генетики и селекции
Subjects:
Online Access:https://vavilov.elpub.ru/jour/article/view/2917
_version_ 1797214012489859072
author E. A. Urbanovich
D. A. Afonnikov
S. V. Nikolaev
author_facet E. A. Urbanovich
D. A. Afonnikov
S. V. Nikolaev
author_sort E. A. Urbanovich
collection DOAJ
description Determining the quantitative content of chlorophylls in plant leaves by their reflection spectra is an important task both in monitoring the state of natural and industrial phytocenoses, and in laboratory studies of normal and pathological processes during plant growth. The use of machine learning methods for these purposes is promising, since these methods allow inferring the relationships between input and output variables (prediction model), and in order to improve the quality of the prediction, a researcher may modify predictors and selects a set of method parameters. Here, we present the results of the implementation and evaluation of the random forest algorithm for predicting the total concentration of chlorophylls a and b from the ref lection spectra of plant leaves in the visible and infrared wavelengths. We used the ref lection spectra for 276 leaf samples from 39 plant species obtained from open sources. 181 samples were from the sycamore maple (Acer pseudoplatanus L.). The ref lection spectrum represented wavelengths from 400 to 2500 nm with a step of 1 nm. The training set consisted of the 85 % of A. pseudoplatanus L. samples, and the performance was evaluated on the remaining 15 % samples of this species (validation sample). Six models based on the random forest algorithm with different predictors were evaluated. The selection of control parameters was performed by cross-checking on five partitions. For the f irst model, the intensity of the ref lection spectra without any transformation was used. Based on the analysis of this model, the optimal ranges of wavelengths for the remaining f ive models were selected. The best results were obtained by models that used a two-point estimation of the derivative of the ref lection spectrum in the visible wavelength range as input data. We compared one of these models (the two-point estimation of the derivative of the ref lection spectrum in the range of 400–800 nm with a step of 1 nm) with the model by other authors (which is based on the functional dependence between two unknown parameters selected by the least squares method and two ref lection coeff icients, the choice of which is described in the article). The comparison of the results of predictions of the model based on the random forest algorithm with the model of other authors was carried out both on the validation sample of maple and on the sample from other plant species. In the f irst case, the predictions of the method based on a random forest had a lower estimate of the standard deviation. In the second case, the predictions of this method had a large error for small values of chlorophyll, while the third-party method had acceptable predictions. The article provides the analysis of the results, as well as recommendations for using this machine learning method to assess the quantitative content of chlorophylls in leaves.
first_indexed 2024-03-07T16:05:06Z
format Article
id doaj.art-748bac66f1014652b202645d992c78ae
institution Directory Open Access Journal
issn 2500-3259
language English
last_indexed 2024-04-24T11:07:24Z
publishDate 2021-03-01
publisher Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
record_format Article
series Вавиловский журнал генетики и селекции
spelling doaj.art-748bac66f1014652b202645d992c78ae2024-04-11T15:31:03ZengSiberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and BreedersВавиловский журнал генетики и селекции2500-32592021-03-01251647010.18699/VJ21.0081133Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithmE. A. Urbanovich0D. A. Afonnikov1S. V. Nikolaev2Novosibirsk State Technical UniversityInstitute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Novosibirsk State UniversityInstitute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Moscow State Academy of Veterinary Medicine and Biotechnology – MVA named after K.I. SkryabinDetermining the quantitative content of chlorophylls in plant leaves by their reflection spectra is an important task both in monitoring the state of natural and industrial phytocenoses, and in laboratory studies of normal and pathological processes during plant growth. The use of machine learning methods for these purposes is promising, since these methods allow inferring the relationships between input and output variables (prediction model), and in order to improve the quality of the prediction, a researcher may modify predictors and selects a set of method parameters. Here, we present the results of the implementation and evaluation of the random forest algorithm for predicting the total concentration of chlorophylls a and b from the ref lection spectra of plant leaves in the visible and infrared wavelengths. We used the ref lection spectra for 276 leaf samples from 39 plant species obtained from open sources. 181 samples were from the sycamore maple (Acer pseudoplatanus L.). The ref lection spectrum represented wavelengths from 400 to 2500 nm with a step of 1 nm. The training set consisted of the 85 % of A. pseudoplatanus L. samples, and the performance was evaluated on the remaining 15 % samples of this species (validation sample). Six models based on the random forest algorithm with different predictors were evaluated. The selection of control parameters was performed by cross-checking on five partitions. For the f irst model, the intensity of the ref lection spectra without any transformation was used. Based on the analysis of this model, the optimal ranges of wavelengths for the remaining f ive models were selected. The best results were obtained by models that used a two-point estimation of the derivative of the ref lection spectrum in the visible wavelength range as input data. We compared one of these models (the two-point estimation of the derivative of the ref lection spectrum in the range of 400–800 nm with a step of 1 nm) with the model by other authors (which is based on the functional dependence between two unknown parameters selected by the least squares method and two ref lection coeff icients, the choice of which is described in the article). The comparison of the results of predictions of the model based on the random forest algorithm with the model of other authors was carried out both on the validation sample of maple and on the sample from other plant species. In the f irst case, the predictions of the method based on a random forest had a lower estimate of the standard deviation. In the second case, the predictions of this method had a large error for small values of chlorophyll, while the third-party method had acceptable predictions. The article provides the analysis of the results, as well as recommendations for using this machine learning method to assess the quantitative content of chlorophylls in leaves.https://vavilov.elpub.ru/jour/article/view/2917random forestremote methodsleaf opticspigments
spellingShingle E. A. Urbanovich
D. A. Afonnikov
S. V. Nikolaev
Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
Вавиловский журнал генетики и селекции
random forest
remote methods
leaf optics
pigments
title Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
title_full Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
title_fullStr Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
title_full_unstemmed Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
title_short Determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
title_sort determination of the quantitative content of chlorophylls in leaves by reflection spectra using the random forest algorithm
topic random forest
remote methods
leaf optics
pigments
url https://vavilov.elpub.ru/jour/article/view/2917
work_keys_str_mv AT eaurbanovich determinationofthequantitativecontentofchlorophyllsinleavesbyreflectionspectrausingtherandomforestalgorithm
AT daafonnikov determinationofthequantitativecontentofchlorophyllsinleavesbyreflectionspectrausingtherandomforestalgorithm
AT svnikolaev determinationofthequantitativecontentofchlorophyllsinleavesbyreflectionspectrausingtherandomforestalgorithm