Data mining applied to feature selection methods for aboveground carbon stock modelling

Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used d...

Full description

Bibliographic Details
Main Authors: Mônica Canaan Carvalho, Lucas Rezende Gomide, José Roberto Soares Scolforo, Kalill José Viana da Páscoa, Laís Almeida Araújo, Isáira Leite e Lopes
Format: Article
Language:English
Published: Embrapa Informação Tecnológica 2022-12-01
Series:Pesquisa Agropecuária Brasileira
Subjects:
Online Access:http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=en
_version_ 1811212502337323008
author Mônica Canaan Carvalho
Lucas Rezende Gomide
José Roberto Soares Scolforo
Kalill José Viana da Páscoa
Laís Almeida Araújo
Isáira Leite e Lopes
author_facet Mônica Canaan Carvalho
Lucas Rezende Gomide
José Roberto Soares Scolforo
Kalill José Viana da Páscoa
Laís Almeida Araújo
Isáira Leite e Lopes
author_sort Mônica Canaan Carvalho
collection DOAJ
description Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.
first_indexed 2024-04-12T05:30:26Z
format Article
id doaj.art-bbcab40f992640808a3682a077b53a77
institution Directory Open Access Journal
issn 1678-3921
language English
last_indexed 2024-04-12T05:30:26Z
publishDate 2022-12-01
publisher Embrapa Informação Tecnológica
record_format Article
series Pesquisa Agropecuária Brasileira
spelling doaj.art-bbcab40f992640808a3682a077b53a772022-12-22T03:46:07ZengEmbrapa Informação TecnológicaPesquisa Agropecuária Brasileira1678-39212022-12-015710.1590/s1678-3921.pab2022.v57.03015Data mining applied to feature selection methods for aboveground carbon stock modellingMônica Canaan Carvalhohttps://orcid.org/0000-0002-2335-3998Lucas Rezende Gomidehttps://orcid.org/0000-0002-4781-0428José Roberto Soares Scolforohttps://orcid.org/0000-0002-5888-6751Kalill José Viana da Páscoahttps://orcid.org/0000-0002-5786-1501Laís Almeida Araújohttps://orcid.org/0000-0001-5510-2862Isáira Leite e Lopeshttps://orcid.org/0000-0001-7428-5553Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=enforest managementgenetic algorithmrandom forest
spellingShingle Mônica Canaan Carvalho
Lucas Rezende Gomide
José Roberto Soares Scolforo
Kalill José Viana da Páscoa
Laís Almeida Araújo
Isáira Leite e Lopes
Data mining applied to feature selection methods for aboveground carbon stock modelling
Pesquisa Agropecuária Brasileira
forest management
genetic algorithm
random forest
title Data mining applied to feature selection methods for aboveground carbon stock modelling
title_full Data mining applied to feature selection methods for aboveground carbon stock modelling
title_fullStr Data mining applied to feature selection methods for aboveground carbon stock modelling
title_full_unstemmed Data mining applied to feature selection methods for aboveground carbon stock modelling
title_short Data mining applied to feature selection methods for aboveground carbon stock modelling
title_sort data mining applied to feature selection methods for aboveground carbon stock modelling
topic forest management
genetic algorithm
random forest
url http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-204X2022000103800&tlng=en
work_keys_str_mv AT monicacanaancarvalho dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling
AT lucasrezendegomide dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling
AT joserobertosoaresscolforo dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling
AT kalilljosevianadapascoa dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling
AT laisalmeidaaraujo dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling
AT isairaleiteelopes dataminingappliedtofeatureselectionmethodsforabovegroundcarbonstockmodelling