Fundamental error in tree-based machine learning model selection for reservoir characterisation

Over the past two decades, machine learning techniques have been extensively used in predicting reservoir properties. While this approach has significantly contributed to the industry, selecting an appropriate model is still challenging for most researchers. Relying solely on statistical metrics to...

Full description

Bibliographic Details
Main Author: Daniel Asante Otchere
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-04-01
Series:Energy Geoscience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666759223000756
_version_ 1797200800871612416
author Daniel Asante Otchere
author_facet Daniel Asante Otchere
author_sort Daniel Asante Otchere
collection DOAJ
description Over the past two decades, machine learning techniques have been extensively used in predicting reservoir properties. While this approach has significantly contributed to the industry, selecting an appropriate model is still challenging for most researchers. Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach. This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field, wireline logs were used to train Extra-Trees, Ridge, Bagging, and XGBoost models. The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models. Based on statistical metrics, the Extra-Trees model achieved the highest test accuracy of 0.996, RMSE of 19.54 mD, and MAE of 3.18 mD, with XGBoost coming in second. However, when the results were visualised, it was discovered that the XGBoost model was more suitable for the problem being tackled. The XGBoost model was a better predictor within the sandstone interval, while the Extra-Trees model was more appropriate in non-sandstone intervals. Since this study aims to predict permeability in the reservoir interval, the XGBoost model is the most suitable. These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric. Given the heterogeneity of the subsurface, relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.
first_indexed 2024-04-24T07:37:25Z
format Article
id doaj.art-746a9d335e1a45a7bddb46382a93f457
institution Directory Open Access Journal
issn 2666-7592
language English
last_indexed 2024-04-24T07:37:25Z
publishDate 2024-04-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Energy Geoscience
spelling doaj.art-746a9d335e1a45a7bddb46382a93f4572024-04-20T04:17:54ZengKeAi Communications Co., Ltd.Energy Geoscience2666-75922024-04-0152100229Fundamental error in tree-based machine learning model selection for reservoir characterisationDaniel Asante Otchere0Centre of Excellence in Subsurface Seismic Imaging and Hydrocarbon Prediction, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Perak Darul Ridzuan, MalaysiaOver the past two decades, machine learning techniques have been extensively used in predicting reservoir properties. While this approach has significantly contributed to the industry, selecting an appropriate model is still challenging for most researchers. Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach. This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field, wireline logs were used to train Extra-Trees, Ridge, Bagging, and XGBoost models. The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models. Based on statistical metrics, the Extra-Trees model achieved the highest test accuracy of 0.996, RMSE of 19.54 mD, and MAE of 3.18 mD, with XGBoost coming in second. However, when the results were visualised, it was discovered that the XGBoost model was more suitable for the problem being tackled. The XGBoost model was a better predictor within the sandstone interval, while the Extra-Trees model was more appropriate in non-sandstone intervals. Since this study aims to predict permeability in the reservoir interval, the XGBoost model is the most suitable. These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric. Given the heterogeneity of the subsurface, relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.http://www.sciencedirect.com/science/article/pii/S2666759223000756Data visualisationPermeabilityMachine learningStatistical metrics
spellingShingle Daniel Asante Otchere
Fundamental error in tree-based machine learning model selection for reservoir characterisation
Energy Geoscience
Data visualisation
Permeability
Machine learning
Statistical metrics
title Fundamental error in tree-based machine learning model selection for reservoir characterisation
title_full Fundamental error in tree-based machine learning model selection for reservoir characterisation
title_fullStr Fundamental error in tree-based machine learning model selection for reservoir characterisation
title_full_unstemmed Fundamental error in tree-based machine learning model selection for reservoir characterisation
title_short Fundamental error in tree-based machine learning model selection for reservoir characterisation
title_sort fundamental error in tree based machine learning model selection for reservoir characterisation
topic Data visualisation
Permeability
Machine learning
Statistical metrics
url http://www.sciencedirect.com/science/article/pii/S2666759223000756
work_keys_str_mv AT danielasanteotchere fundamentalerrorintreebasedmachinelearningmodelselectionforreservoircharacterisation