Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil
Soil is one of the Earth’s most important natural resources. The presence of metals can decrease environmental quality if present in excessive amounts. Analyzing soil metal contents can be costly and time consuming, but near-infrared (NIR) spectroscopy coupled with chemometric tools can offer an alt...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-10-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/28/19/6959 |
_version_ | 1797575438937096192 |
---|---|
author | Giovanna Abrantes Valber Almeida Angelo Jamil Maia Rennan Nascimento Clistenes Nascimento Ygor Silva Yuri Silva Germano Veras |
author_facet | Giovanna Abrantes Valber Almeida Angelo Jamil Maia Rennan Nascimento Clistenes Nascimento Ygor Silva Yuri Silva Germano Veras |
author_sort | Giovanna Abrantes |
collection | DOAJ |
description | Soil is one of the Earth’s most important natural resources. The presence of metals can decrease environmental quality if present in excessive amounts. Analyzing soil metal contents can be costly and time consuming, but near-infrared (NIR) spectroscopy coupled with chemometric tools can offer an alternative. The most important multivariate calibration method to predict concentrations or physical, chemical or physicochemical properties as a chemometric tool is partial least-squares (PLS) regression. However, a large number of irrelevant variables may cause problems of accuracy in the predictive chemometric models. Thus, stochastic variable-selection techniques, such as the Firefly algorithm by intervals in PLS (FFiPLS), can provide better solutions for specific problems. This study aimed to evaluate the performance of FFiPLS against deterministic PLS algorithms for the prediction of metals in river basin soils. The samples had their spectra collected from the region of 1000–2500 nm. Predictive models were then built from the spectral data, including PLS, interval-PLS (iPLS), successive projections algorithm for interval selection in PLS (iSPA-PLS), and FFiPLS. The chemometric models were built with raw data and preprocessed data by using different methods such as multiplicative scatter correction (MSC), standard normal variate (SNV), mean centering, adjustment of baseline and smoothing by the Savitzky–Golay method. The elliptical joint confidence region (EJCR) used in each chemometric model presented adequate fit. FFiPLS models of iron and titanium obtained a relative prediction deviation (RPD) of more than 2. The chemometric models for determination of aluminum obtained an RPD of more than 2 in the preprocessed data with SNV, MSC and baseline (offset + linear) and with raw data. The metals Be, Gd and Y failed to obtain adequate models in terms of residual prediction deviation (RPD). These results are associated with the low values of metals in the samples. Considering the complexity of the samples, the relative error of prediction (REP) obtained between 10 and 25% of the values adequate for this type of sample. Root mean square error of calibration and prediction (RMSEC and RMSEP, respectively) presented the same profile as the other quality parameters. The FFiPLS algorithm outperformed deterministic algorithms in the construction of models estimating the content of Al, Be, Gd and Y. This study produced chemometric models with variable selection able to determine metals in the Ipojuca River watershed soils using reflectance-mode NIR spectrometry. |
first_indexed | 2024-03-10T21:38:35Z |
format | Article |
id | doaj.art-48f83c96e61440e2bed4a94920006292 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-10T21:38:35Z |
publishDate | 2023-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-48f83c96e61440e2bed4a949200062922023-11-19T14:47:52ZengMDPI AGMolecules1420-30492023-10-012819695910.3390/molecules28196959Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in SoilGiovanna Abrantes0Valber Almeida1Angelo Jamil Maia2Rennan Nascimento3Clistenes Nascimento4Ygor Silva5Yuri Silva6Germano Veras7Departamento de Química, Centro de Ciência e Tecnologia, Universidade Estadual da Paraíba, Campina Grande 58429-500, BrazilDepartamento de Química, Centro de Ciência e Tecnologia, Universidade Estadual da Paraíba, Campina Grande 58429-500, BrazilAgronomy Department, Federal Rural University of Pernambuco, Recife 52171-900, BrazilAgronomy Department, Federal Rural University of Pernambuco, Recife 52171-900, BrazilAgronomy Department, Federal Rural University of Pernambuco, Recife 52171-900, BrazilAgronomy Department, Federal Rural University of Pernambuco, Recife 52171-900, BrazilAgronomy Department, Federal University of Piauí, Bom Jesus 64900-000, BrazilDepartamento de Química, Centro de Ciência e Tecnologia, Universidade Estadual da Paraíba, Campina Grande 58429-500, BrazilSoil is one of the Earth’s most important natural resources. The presence of metals can decrease environmental quality if present in excessive amounts. Analyzing soil metal contents can be costly and time consuming, but near-infrared (NIR) spectroscopy coupled with chemometric tools can offer an alternative. The most important multivariate calibration method to predict concentrations or physical, chemical or physicochemical properties as a chemometric tool is partial least-squares (PLS) regression. However, a large number of irrelevant variables may cause problems of accuracy in the predictive chemometric models. Thus, stochastic variable-selection techniques, such as the Firefly algorithm by intervals in PLS (FFiPLS), can provide better solutions for specific problems. This study aimed to evaluate the performance of FFiPLS against deterministic PLS algorithms for the prediction of metals in river basin soils. The samples had their spectra collected from the region of 1000–2500 nm. Predictive models were then built from the spectral data, including PLS, interval-PLS (iPLS), successive projections algorithm for interval selection in PLS (iSPA-PLS), and FFiPLS. The chemometric models were built with raw data and preprocessed data by using different methods such as multiplicative scatter correction (MSC), standard normal variate (SNV), mean centering, adjustment of baseline and smoothing by the Savitzky–Golay method. The elliptical joint confidence region (EJCR) used in each chemometric model presented adequate fit. FFiPLS models of iron and titanium obtained a relative prediction deviation (RPD) of more than 2. The chemometric models for determination of aluminum obtained an RPD of more than 2 in the preprocessed data with SNV, MSC and baseline (offset + linear) and with raw data. The metals Be, Gd and Y failed to obtain adequate models in terms of residual prediction deviation (RPD). These results are associated with the low values of metals in the samples. Considering the complexity of the samples, the relative error of prediction (REP) obtained between 10 and 25% of the values adequate for this type of sample. Root mean square error of calibration and prediction (RMSEC and RMSEP, respectively) presented the same profile as the other quality parameters. The FFiPLS algorithm outperformed deterministic algorithms in the construction of models estimating the content of Al, Be, Gd and Y. This study produced chemometric models with variable selection able to determine metals in the Ipojuca River watershed soils using reflectance-mode NIR spectrometry.https://www.mdpi.com/1420-3049/28/19/6959metal contentvibrational spectroscopychemometricsFFiPLSmultivariate calibration |
spellingShingle | Giovanna Abrantes Valber Almeida Angelo Jamil Maia Rennan Nascimento Clistenes Nascimento Ygor Silva Yuri Silva Germano Veras Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil Molecules metal content vibrational spectroscopy chemometrics FFiPLS multivariate calibration |
title | Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil |
title_full | Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil |
title_fullStr | Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil |
title_full_unstemmed | Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil |
title_short | Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil |
title_sort | comparison between variable selection algorithms in pls regression with near infrared spectroscopy to predict selected metals in soil |
topic | metal content vibrational spectroscopy chemometrics FFiPLS multivariate calibration |
url | https://www.mdpi.com/1420-3049/28/19/6959 |
work_keys_str_mv | AT giovannaabrantes comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT valberalmeida comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT angelojamilmaia comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT rennannascimento comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT clistenesnascimento comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT ygorsilva comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT yurisilva comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil AT germanoveras comparisonbetweenvariableselectionalgorithmsinplsregressionwithnearinfraredspectroscopytopredictselectedmetalsinsoil |