Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost
The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are eith...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Energies |
Subjects: | |
Online Access: | https://www.mdpi.com/1996-1073/16/10/4159 |
_version_ | 1797600229710626816 |
---|---|
author | Jiangtao Sun Wei Dang Fengqin Wang Haikuan Nie Xiaoliang Wei Pei Li Shaohua Zhang Yubo Feng Fei Li |
author_facet | Jiangtao Sun Wei Dang Fengqin Wang Haikuan Nie Xiaoliang Wei Pei Li Shaohua Zhang Yubo Feng Fei Li |
author_sort | Jiangtao Sun |
collection | DOAJ |
description | The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R<sup>2</sup> = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin. |
first_indexed | 2024-03-11T03:45:32Z |
format | Article |
id | doaj.art-30c71ecb0c7e4707b3a260d017400686 |
institution | Directory Open Access Journal |
issn | 1996-1073 |
language | English |
last_indexed | 2024-03-11T03:45:32Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Energies |
spelling | doaj.art-30c71ecb0c7e4707b3a260d0174006862023-11-18T01:13:43ZengMDPI AGEnergies1996-10732023-05-011610415910.3390/en16104159Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoostJiangtao Sun0Wei Dang1Fengqin Wang2Haikuan Nie3Xiaoliang Wei4Pei Li5Shaohua Zhang6Yubo Feng7Fei Li8School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaPetroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, ChinaExploration and Development Institute of Shengli Oilfield Company, SINOPEC, Dongying 257000, ChinaPetroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaSchool of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, ChinaThe total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R<sup>2</sup> = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.https://www.mdpi.com/1996-1073/16/10/4159TOC contentrandom forestsupport vector machineXGBoostorganic-rich shale |
spellingShingle | Jiangtao Sun Wei Dang Fengqin Wang Haikuan Nie Xiaoliang Wei Pei Li Shaohua Zhang Yubo Feng Fei Li Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost Energies TOC content random forest support vector machine XGBoost organic-rich shale |
title | Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost |
title_full | Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost |
title_fullStr | Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost |
title_full_unstemmed | Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost |
title_short | Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost |
title_sort | prediction of toc content in organic rich shale using machine learning algorithms comparative study of random forest support vector machine and xgboost |
topic | TOC content random forest support vector machine XGBoost organic-rich shale |
url | https://www.mdpi.com/1996-1073/16/10/4159 |
work_keys_str_mv | AT jiangtaosun predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT weidang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT fengqinwang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT haikuannie predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT xiaoliangwei predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT peili predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT shaohuazhang predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT yubofeng predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost AT feili predictionoftoccontentinorganicrichshaleusingmachinelearningalgorithmscomparativestudyofrandomforestsupportvectormachineandxgboost |