Development of robust procedures for partial least square regression with application to near infrared spectral data
The Partial Least Square Regression (PLSR) is a multivariate method commonly used to build a predictive model of Near Infrared (NIR) spectral data. Based on our experience, several weaknesses of the PLSR have been identified with respect to its robustness issues in the pre-processing and inproces...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/98710/1/IPM%202021%208%20-%20IR.pdf |
_version_ | 1796983655390773248 |
---|---|
author | Silalahi, Divo Dharma |
author_facet | Silalahi, Divo Dharma |
author_sort | Silalahi, Divo Dharma |
collection | UPM |
description | The Partial Least Square Regression (PLSR) is a multivariate method
commonly used to build a predictive model of Near Infrared (NIR) spectral data.
Based on our experience, several weaknesses of the PLSR have been
identified with respect to its robustness issues in the pre-processing and inprocessing
when outliers and High Leverage Points (HLP) exist in the dataset.
In addressing these problems, some robust procedures for PLSR are
developed.
In the pre-processing, the pretreatment procedure is needed to remove both
additive and multiplicative baseline effects and to distinguish the scattering
effect in the raw spectral. The existing methods are not very successful in
removing those effects. Hence, a new robust Generalized Multiplicative Scatter
Correction (GMSC) algorithm is proposed to correct the additive and/or
multiplicative baseline effects during pre-processing spectra. The results
indicate that the proposed method outperforms the existing methods in this
study.
In the in-processing, the PLSR model is very sensitive to the optimal number of
PLS components used in the model fitting process. Several selection
procedures of the optimal number of PLS components have been developed in
this regard. However, each procedure yields different result. To date, no one
has been able to determine the more superior method. Hence, a Robust
Reliable Weighted Average (RRWA-PLS) which does not require the selection
of an optimal number of PLS is developed by employing the weighted average
strategy from multiple PLSR models generated by different complexity of the
PLS components. In the PLSR model there is no variable selection procedure
that able to remove the irrelevant wavelengths. To fill-in the gap in the literature, a new robust procedure in wavelength selection based on input
scaling method is developed using Filter-Wrapper method. The PLSR fails to
discover the nonlinear structure in the original input space. As such, the use of
the classical PLSR might not be appropriate. In addition, the contamination of
outliers and HLP in the dataset also might damage the whole data processing
procedures. To address these problems, robust nonlinear solutions of PLSR
are developed through kernel based learning by nonlinearly projecting the
original input data matrix to a high dimensional feature mapping corresponding
to the kernel space. The nonlinear solutions coupled with some improved
robust methods such as Diagnostic Robust Generalized Potential (DRGP)
method and GM6-Estimator are also introduced.
Several statistical measures such as Root Mean Squared Error (RMSE),
Coefficient of Determination (R2), Ratio of Performance to Deviation (RPD), and
Standard Error (SE) are used to evaluate the superiority of the proposed
methods. The results of the simulation study and two NIR spectral data sets,
namely the NIR spectral of oil palm (Elaeis guineensis Jacq.) fresh and dried
ground fruit mesocarp, show that all the proposed methods are superior
compared to the existing methods in this study. |
first_indexed | 2024-03-06T11:09:26Z |
format | Thesis |
id | upm.eprints-98710 |
institution | Universiti Putra Malaysia |
language | English |
last_indexed | 2024-03-06T11:09:26Z |
publishDate | 2021 |
record_format | dspace |
spelling | upm.eprints-987102022-09-19T23:47:35Z http://psasir.upm.edu.my/id/eprint/98710/ Development of robust procedures for partial least square regression with application to near infrared spectral data Silalahi, Divo Dharma The Partial Least Square Regression (PLSR) is a multivariate method commonly used to build a predictive model of Near Infrared (NIR) spectral data. Based on our experience, several weaknesses of the PLSR have been identified with respect to its robustness issues in the pre-processing and inprocessing when outliers and High Leverage Points (HLP) exist in the dataset. In addressing these problems, some robust procedures for PLSR are developed. In the pre-processing, the pretreatment procedure is needed to remove both additive and multiplicative baseline effects and to distinguish the scattering effect in the raw spectral. The existing methods are not very successful in removing those effects. Hence, a new robust Generalized Multiplicative Scatter Correction (GMSC) algorithm is proposed to correct the additive and/or multiplicative baseline effects during pre-processing spectra. The results indicate that the proposed method outperforms the existing methods in this study. In the in-processing, the PLSR model is very sensitive to the optimal number of PLS components used in the model fitting process. Several selection procedures of the optimal number of PLS components have been developed in this regard. However, each procedure yields different result. To date, no one has been able to determine the more superior method. Hence, a Robust Reliable Weighted Average (RRWA-PLS) which does not require the selection of an optimal number of PLS is developed by employing the weighted average strategy from multiple PLSR models generated by different complexity of the PLS components. In the PLSR model there is no variable selection procedure that able to remove the irrelevant wavelengths. To fill-in the gap in the literature, a new robust procedure in wavelength selection based on input scaling method is developed using Filter-Wrapper method. The PLSR fails to discover the nonlinear structure in the original input space. As such, the use of the classical PLSR might not be appropriate. In addition, the contamination of outliers and HLP in the dataset also might damage the whole data processing procedures. To address these problems, robust nonlinear solutions of PLSR are developed through kernel based learning by nonlinearly projecting the original input data matrix to a high dimensional feature mapping corresponding to the kernel space. The nonlinear solutions coupled with some improved robust methods such as Diagnostic Robust Generalized Potential (DRGP) method and GM6-Estimator are also introduced. Several statistical measures such as Root Mean Squared Error (RMSE), Coefficient of Determination (R2), Ratio of Performance to Deviation (RPD), and Standard Error (SE) are used to evaluate the superiority of the proposed methods. The results of the simulation study and two NIR spectral data sets, namely the NIR spectral of oil palm (Elaeis guineensis Jacq.) fresh and dried ground fruit mesocarp, show that all the proposed methods are superior compared to the existing methods in this study. 2021-01 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/98710/1/IPM%202021%208%20-%20IR.pdf Silalahi, Divo Dharma (2021) Development of robust procedures for partial least square regression with application to near infrared spectral data. Doctoral thesis, Universiti Putra Malaysia. Regression analysis Least squares |
spellingShingle | Regression analysis Least squares Silalahi, Divo Dharma Development of robust procedures for partial least square regression with application to near infrared spectral data |
title | Development of robust procedures for partial least square regression with application to near infrared spectral data |
title_full | Development of robust procedures for partial least square regression with application to near infrared spectral data |
title_fullStr | Development of robust procedures for partial least square regression with application to near infrared spectral data |
title_full_unstemmed | Development of robust procedures for partial least square regression with application to near infrared spectral data |
title_short | Development of robust procedures for partial least square regression with application to near infrared spectral data |
title_sort | development of robust procedures for partial least square regression with application to near infrared spectral data |
topic | Regression analysis Least squares |
url | http://psasir.upm.edu.my/id/eprint/98710/1/IPM%202021%208%20-%20IR.pdf |
work_keys_str_mv | AT silalahidivodharma developmentofrobustproceduresforpartialleastsquareregressionwithapplicationtonearinfraredspectraldata |