Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are eith...

Full description

Bibliographic Details
Main Authors: Jiangtao Sun, Wei Dang, Fengqin Wang, Haikuan Nie, Xiaoliang Wei, Pei Li, Shaohua Zhang, Yubo Feng, Fei Li
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Energies
Subjects:
Online Access:https://www.mdpi.com/1996-1073/16/10/4159
_version_ 1797600229710626816
author Jiangtao Sun
Wei Dang
Fengqin Wang
Haikuan Nie
Xiaoliang Wei
Pei Li
Shaohua Zhang
Yubo Feng
Fei Li
author_facet Jiangtao Sun
Wei Dang
Fengqin Wang
Haikuan Nie
Xiaoliang Wei
Pei Li
Shaohua Zhang
Yubo Feng
Fei Li
author_sort Jiangtao Sun
collection DOAJ
description The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R<sup>2</sup> = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.
first_indexed 2024-03-11T03:45:32Z
format Article
id doaj.art-30c71ecb0c7e4707b3a260d017400686
institution Directory Open Access Journal
issn 1996-1073
language English
last_indexed 2024-03-11T03:45:32Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Energies
spelling doaj.art-30c71ecb0c7e4707b3a260d0174006862023-11-18T01:13:43ZengMDPI AGEnergies1996-10732023-05-011610415910.3390/en16104159Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoostJiangtao Sun0Wei Dang1Fengqin Wang2Haikuan Nie3Xiaoliang Wei4Pei Li5Shaohua Zhang6Yubo Feng7Fei Li8School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaPetroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, ChinaExploration and Development Institute of Shengli Oilfield Company, SINOPEC, Dongying 257000, ChinaPetroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaThe total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R<sup>2</sup> = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.https://www.mdpi.com/1996-1073/16/10/4159TOC contentrandom forestsupport vector machineXGBoostorganic-rich shale
spellingShingle Jiangtao Sun
Wei Dang
Fengqin Wang
Haikuan Nie
Xiaoliang Wei
Pei Li
Shaohua Zhang
Yubo Feng
Fei Li
Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
Energies
TOC content
random forest
support vector machine
XGBoost
organic-rich shale
title Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
title_full Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
title_fullStr Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
title_full_unstemmed Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
title_short Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
title_sort prediction of toc content in organic rich shale using machine learning algorithms comparative study of random forest support vector machine and xgboost
topic TOC content
random forest
support vector machine
XGBoost
organic-rich shale
url https://www.mdpi.com/1996-1073/16/10/4159
work_keys_str_mv AT jiangtaosun predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT weidang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT fengqinwang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT haikuannie predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT xiaoliangwei predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT peili predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT shaohuazhang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT yubofeng predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost
AT feili predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost