SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING

Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data...

Full description

Bibliographic Details
Main Authors: Mahmud Iwan Solihin, Chan Jin Yuan, Wan Siu Hong, Liew Phing Pui, Ang Chun Kit, Wafa Hossain, Affiani Machmudah
Format: Article
Language:English
Published: IIUM Press, International Islamic University Malaysia 2024-01-01
Series:International Islamic University Malaysia Engineering Journal
Subjects:
Online Access:https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/2796
_version_ 1827394210669527040
author Mahmud Iwan Solihin
Chan Jin Yuan
Wan Siu Hong
Liew Phing Pui
Ang Chun Kit
Wafa Hossain
Affiani Machmudah
author_facet Mahmud Iwan Solihin
Chan Jin Yuan
Wan Siu Hong
Liew Phing Pui
Ang Chun Kit
Wafa Hossain
Affiani Machmudah
author_sort Mahmud Iwan Solihin
collection DOAJ
description Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with   of at least  »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively  ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1. ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi  masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing  antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA.
first_indexed 2024-03-08T18:08:17Z
format Article
id doaj.art-1fb93a9c14134066b3fdc87615a0dba2
institution Directory Open Access Journal
issn 1511-788X
2289-7860
language English
last_indexed 2024-03-08T18:08:17Z
publishDate 2024-01-01
publisher IIUM Press, International Islamic University Malaysia
record_format Article
series International Islamic University Malaysia Engineering Journal
spelling doaj.art-1fb93a9c14134066b3fdc87615a0dba22024-01-01T11:24:05ZengIIUM Press, International Islamic University MalaysiaInternational Islamic University Malaysia Engineering Journal1511-788X2289-78602024-01-0125110.31436/iiumej.v25i1.2796SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING Mahmud Iwan Solihin0https://orcid.org/0000-0002-5293-7466Chan Jin Yuan1https://orcid.org/0000-0002-4262-1628Wan Siu Hong2https://orcid.org/0000-0002-8254-4962Liew Phing Pui3Ang Chun Kit4https://orcid.org/0000-0002-1215-909XWafa Hossain5Affiani Machmudah6UCSI UniversityUCSI University UCSI University UCSI UniversityUCSI UniversityUCSI UniversityAirlangga University Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with   of at least  »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively  ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1. ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi  masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing  antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA. https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/2796chemometrics calibrationensemble machine learningnear infrared spectroscopy
spellingShingle Mahmud Iwan Solihin
Chan Jin Yuan
Wan Siu Hong
Liew Phing Pui
Ang Chun Kit
Wafa Hossain
Affiani Machmudah
SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
International Islamic University Malaysia Engineering Journal
chemometrics calibration
ensemble machine learning
near infrared spectroscopy
title SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
title_full SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
title_fullStr SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
title_full_unstemmed SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
title_short SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING
title_sort spectroscopy data calibration using stacked ensemble machine learning
topic chemometrics calibration
ensemble machine learning
near infrared spectroscopy
url https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/2796
work_keys_str_mv AT mahmudiwansolihin spectroscopydatacalibrationusingstackedensemblemachinelearning
AT chanjinyuan spectroscopydatacalibrationusingstackedensemblemachinelearning
AT wansiuhong spectroscopydatacalibrationusingstackedensemblemachinelearning
AT liewphingpui spectroscopydatacalibrationusingstackedensemblemachinelearning
AT angchunkit spectroscopydatacalibrationusingstackedensemblemachinelearning
AT wafahossain spectroscopydatacalibrationusingstackedensemblemachinelearning
AT affianimachmudah spectroscopydatacalibrationusingstackedensemblemachinelearning