Exact Conditioning of Regression Random Forest for Spatial Prediction

Regression random forest is becoming a widely-used machine learning technique for spatial prediction that shows competitive prediction performance in various geoscience fields. Like other popular machine learning methods for spatial prediction, regression random forest does not exactly honor the res...

Full description

Bibliographic Details
Main Author: Francky Fouedjio
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2020-12-01
Series:Artificial Intelligence in Geosciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666544121000010
_version_ 1811155486537416704
author Francky Fouedjio
author_facet Francky Fouedjio
author_sort Francky Fouedjio
collection DOAJ
description Regression random forest is becoming a widely-used machine learning technique for spatial prediction that shows competitive prediction performance in various geoscience fields. Like other popular machine learning methods for spatial prediction, regression random forest does not exactly honor the response variable’s measured values at sampled locations. However, competitor methods such as regression-kriging perfectly fit the response variable’s observed values at sampled locations by construction. Exactly matching the response variable’s measured values at sampled locations is often desirable in many geoscience applications. This paper presents a new approach ensuring that regression random forest perfectly matches the response variable’s observed values at sampled locations. The main idea consists of using the principal component analysis to create an orthogonal representation of the ensemble of regression tree predictors resulting from the traditional regression random forest. Then, the exact conditioning problem is reformulated as a Bayes-linear-Gauss problem on principal component scores. This problem has an analytical solution making it easy to perform Monte Carlo sampling of new principal component scores and then reconstruct regression tree predictors that perfectly match the response variable’s observed values at sampled locations. The reconstructed regression tree predictors’ average also precisely matches the response variable’s measured values at sampled locations by construction. The proposed method’s effectiveness is illustrated on the one hand using a synthetic dataset where the ground-truth is available everywhere within the study region, and on the other hand, using a real dataset comprising southwest England’s geochemical concentration data. It is compared with the regression-kriging and the traditional regression random forest. It appears that the proposed method can perfectly fit the response variable’s measured values at sampled locations while achieving good out of sample predictive performance comparatively to regression-kriging and traditional regression random forest.
first_indexed 2024-04-10T04:35:07Z
format Article
id doaj.art-5b3e1487e8424e5ba61bf2b7352e41f7
institution Directory Open Access Journal
issn 2666-5441
language English
last_indexed 2024-04-10T04:35:07Z
publishDate 2020-12-01
publisher KeAi Communications Co. Ltd.
record_format Article
series Artificial Intelligence in Geosciences
spelling doaj.art-5b3e1487e8424e5ba61bf2b7352e41f72023-03-10T04:36:21ZengKeAi Communications Co. Ltd.Artificial Intelligence in Geosciences2666-54412020-12-0111123Exact Conditioning of Regression Random Forest for Spatial PredictionFrancky Fouedjio0AngloGold Ashanti Australia Ltd., Growth and Exploration, 140 St. Georges Terrace, Perth, WA, 6000, AustraliaRegression random forest is becoming a widely-used machine learning technique for spatial prediction that shows competitive prediction performance in various geoscience fields. Like other popular machine learning methods for spatial prediction, regression random forest does not exactly honor the response variable’s measured values at sampled locations. However, competitor methods such as regression-kriging perfectly fit the response variable’s observed values at sampled locations by construction. Exactly matching the response variable’s measured values at sampled locations is often desirable in many geoscience applications. This paper presents a new approach ensuring that regression random forest perfectly matches the response variable’s observed values at sampled locations. The main idea consists of using the principal component analysis to create an orthogonal representation of the ensemble of regression tree predictors resulting from the traditional regression random forest. Then, the exact conditioning problem is reformulated as a Bayes-linear-Gauss problem on principal component scores. This problem has an analytical solution making it easy to perform Monte Carlo sampling of new principal component scores and then reconstruct regression tree predictors that perfectly match the response variable’s observed values at sampled locations. The reconstructed regression tree predictors’ average also precisely matches the response variable’s measured values at sampled locations by construction. The proposed method’s effectiveness is illustrated on the one hand using a synthetic dataset where the ground-truth is available everywhere within the study region, and on the other hand, using a real dataset comprising southwest England’s geochemical concentration data. It is compared with the regression-kriging and the traditional regression random forest. It appears that the proposed method can perfectly fit the response variable’s measured values at sampled locations while achieving good out of sample predictive performance comparatively to regression-kriging and traditional regression random forest.http://www.sciencedirect.com/science/article/pii/S2666544121000010Exact conditioningMonte Carlo samplingMulti-GaussianSpatial predictionPrincipal component analysisRandom forest
spellingShingle Francky Fouedjio
Exact Conditioning of Regression Random Forest for Spatial Prediction
Artificial Intelligence in Geosciences
Exact conditioning
Monte Carlo sampling
Multi-Gaussian
Spatial prediction
Principal component analysis
Random forest
title Exact Conditioning of Regression Random Forest for Spatial Prediction
title_full Exact Conditioning of Regression Random Forest for Spatial Prediction
title_fullStr Exact Conditioning of Regression Random Forest for Spatial Prediction
title_full_unstemmed Exact Conditioning of Regression Random Forest for Spatial Prediction
title_short Exact Conditioning of Regression Random Forest for Spatial Prediction
title_sort exact conditioning of regression random forest for spatial prediction
topic Exact conditioning
Monte Carlo sampling
Multi-Gaussian
Spatial prediction
Principal component analysis
Random forest
url http://www.sciencedirect.com/science/article/pii/S2666544121000010
work_keys_str_mv AT franckyfouedjio exactconditioningofregressionrandomforestforspatialprediction