Data mining applied to feature selection methods for aboveground carbon stock modelling
Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used d...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Embrapa Informação Tecnológica
2022-12-01
|
Series: | Pesquisa Agropecuária Brasileira |
Subjects: | |
Online Access: | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=en |
_version_ | 1811212502337323008 |
---|---|
author | Mônica Canaan Carvalho Lucas Rezende Gomide José Roberto Soares Scolforo Kalill José Viana da Páscoa Laís Almeida Araújo Isáira Leite e Lopes |
author_facet | Mônica Canaan Carvalho Lucas Rezende Gomide José Roberto Soares Scolforo Kalill José Viana da Páscoa Laís Almeida Araújo Isáira Leite e Lopes |
author_sort | Mônica Canaan Carvalho |
collection | DOAJ |
description | Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock. |
first_indexed | 2024-04-12T05:30:26Z |
format | Article |
id | doaj.art-bbcab40f992640808a3682a077b53a77 |
institution | Directory Open Access Journal |
issn | 1678-3921 |
language | English |
last_indexed | 2024-04-12T05:30:26Z |
publishDate | 2022-12-01 |
publisher | Embrapa Informação Tecnológica |
record_format | Article |
series | Pesquisa Agropecuária Brasileira |
spelling | doaj.art-bbcab40f992640808a3682a077b53a772022-12-22T03:46:07ZengEmbrapa Informação TecnológicaPesquisa Agropecuária Brasileira1678-39212022-12-015710.1590/s1678-3921.pab2022.v57.03015Data mining applied to feature selection methods for aboveground carbon stock modellingMônica Canaan Carvalhohttps://orcid.org/0000-0002-2335-3998Lucas Rezende Gomidehttps://orcid.org/0000-0002-4781-0428José Roberto Soares Scolforohttps://orcid.org/0000-0002-5888-6751Kalill José Viana da Páscoahttps://orcid.org/0000-0002-5786-1501Laís Almeida Araújohttps://orcid.org/0000-0001-5510-2862Isáira Leite e Lopeshttps://orcid.org/0000-0001-7428-5553Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=enforest managementgenetic algorithmrandom forest |
spellingShingle | Mônica Canaan Carvalho Lucas Rezende Gomide José Roberto Soares Scolforo Kalill José Viana da Páscoa Laís Almeida Araújo Isáira Leite e Lopes Data mining applied to feature selection methods for aboveground carbon stock modelling Pesquisa Agropecuária Brasileira forest management genetic algorithm random forest |
title | Data mining applied to feature selection methods for aboveground carbon stock modelling |
title_full | Data mining applied to feature selection methods for aboveground carbon stock modelling |
title_fullStr | Data mining applied to feature selection methods for aboveground carbon stock modelling |
title_full_unstemmed | Data mining applied to feature selection methods for aboveground carbon stock modelling |
title_short | Data mining applied to feature selection methods for aboveground carbon stock modelling |
title_sort | data mining applied to feature selection methods for aboveground carbon stock modelling |
topic | forest management genetic algorithm random forest |
url | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=en |
work_keys_str_mv | AT monicacanaancarvalho dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling AT lucasrezendegomide dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling AT joserobertosoaresscolforo dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling AT kalilljosevianadapascoa dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling AT laisalmeidaaraujo dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling AT isairaleiteelopes dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling |