Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy

The successful estimation of soil organic matter (SOM) and soil total nitrogen (TN) contents with mid-infrared (MIR) reflectance spectroscopy depends on selecting appropriate variable selection techniques and multivariate methods for regression analysis. This study aimed to explore the potential of...

Full description

Bibliographic Details
Main Authors: Hong Li, Junwei Wang, Jixiong Zhang, Tongqing Liu, Gifty E. Acquah, Huimin Yuan
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Agronomy
Subjects:
Online Access:https://www.mdpi.com/2073-4395/12/3/638
_version_ 1797447384021598208
author Hong Li
Junwei Wang
Jixiong Zhang
Tongqing Liu
Gifty E. Acquah
Huimin Yuan
author_facet Hong Li
Junwei Wang
Jixiong Zhang
Tongqing Liu
Gifty E. Acquah
Huimin Yuan
author_sort Hong Li
collection DOAJ
description The successful estimation of soil organic matter (SOM) and soil total nitrogen (TN) contents with mid-infrared (MIR) reflectance spectroscopy depends on selecting appropriate variable selection techniques and multivariate methods for regression analysis. This study aimed to explore the potential of combining a multivariate method and spectral variable selection for soil SOM and TN estimation using MIR spectroscopy. Five hundred and ten topsoil samples were collected from Quzhou County, Hebei Province, China, and their SOM and TN contents and reflectance spectra were measured using DRIFT-MIR spectroscopy (diffuse reflectance infrared Fourier transform in the mid-infrared range, MIR, wavenumber: 4000–400 cm<sup>−1</sup>; wavelength: 2500–25,000 nm). Two multivariate methods (partial least-squares regression, PLSR; multiple linear regression, MLR) combined with two variable selection techniques (stability competitive adaptive reweighted sampling, sCARS; bootstrapping soft shrinkage approach, BOSS) were used for model calibration. The MLR model combined with the sCARS method yielded the most accurate estimation result for both SOM (R<sub>p</sub><sup>2</sup> = 0.72 and RPD = 1.89) and TN (R<sub>p</sub><sup>2</sup> = 0.84 and RPD = 2.50). Out of the 2382 wavenumbers in a full spectrum, sCARS determined that only 31 variables were important for SOM estimation (accounting for 1.30% of all variables) and 27 variables were important for TN estimation (accounting for 1.13% of all variables). The results demonstrated that sCARS was a highly efficient approach for extracting information on wavenumbers and mitigating redundant wavenumbers. In addition, the current study indicated that MLR, which is simpler than PLSR, when combined with spectral variable selection, can achieve high-precision prediction of SOM and TN content. As such, DRIFT-MIR spectroscopy coupled with MLR and sCARS is a good alternative for estimating the SOM and TN of soils.
first_indexed 2024-03-09T13:55:31Z
format Article
id doaj.art-e1fbc8f0e7784dbb957f0db6a0870eed
institution Directory Open Access Journal
issn 2073-4395
language English
last_indexed 2024-03-09T13:55:31Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Agronomy
spelling doaj.art-e1fbc8f0e7784dbb957f0db6a0870eed2023-11-30T20:44:25ZengMDPI AGAgronomy2073-43952022-03-0112363810.3390/agronomy12030638Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR SpectroscopyHong Li0Junwei Wang1Jixiong Zhang2Tongqing Liu3Gifty E. Acquah4Huimin Yuan5College of Resources and Environmental Sciences, National Academy of Agriculture Green Development, Key Laboratory of Plant-Soil Interactions, Ministry of Education, China Agricultural University, Beijing 100193, ChinaCollege of Resources and Environmental Sciences, National Academy of Agriculture Green Development, Key Laboratory of Plant-Soil Interactions, Ministry of Education, China Agricultural University, Beijing 100193, ChinaCollege of Resources and Environmental Sciences, National Academy of Agriculture Green Development, Key Laboratory of Plant-Soil Interactions, Ministry of Education, China Agricultural University, Beijing 100193, ChinaCollege of Resources and Environmental Sciences, National Academy of Agriculture Green Development, Key Laboratory of Plant-Soil Interactions, Ministry of Education, China Agricultural University, Beijing 100193, ChinaDepartment of Sustainable Agriculture Sciences, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UKCollege of Resources and Environmental Sciences, National Academy of Agriculture Green Development, Key Laboratory of Plant-Soil Interactions, Ministry of Education, China Agricultural University, Beijing 100193, ChinaThe successful estimation of soil organic matter (SOM) and soil total nitrogen (TN) contents with mid-infrared (MIR) reflectance spectroscopy depends on selecting appropriate variable selection techniques and multivariate methods for regression analysis. This study aimed to explore the potential of combining a multivariate method and spectral variable selection for soil SOM and TN estimation using MIR spectroscopy. Five hundred and ten topsoil samples were collected from Quzhou County, Hebei Province, China, and their SOM and TN contents and reflectance spectra were measured using DRIFT-MIR spectroscopy (diffuse reflectance infrared Fourier transform in the mid-infrared range, MIR, wavenumber: 4000–400 cm<sup>−1</sup>; wavelength: 2500–25,000 nm). Two multivariate methods (partial least-squares regression, PLSR; multiple linear regression, MLR) combined with two variable selection techniques (stability competitive adaptive reweighted sampling, sCARS; bootstrapping soft shrinkage approach, BOSS) were used for model calibration. The MLR model combined with the sCARS method yielded the most accurate estimation result for both SOM (R<sub>p</sub><sup>2</sup> = 0.72 and RPD = 1.89) and TN (R<sub>p</sub><sup>2</sup> = 0.84 and RPD = 2.50). Out of the 2382 wavenumbers in a full spectrum, sCARS determined that only 31 variables were important for SOM estimation (accounting for 1.30% of all variables) and 27 variables were important for TN estimation (accounting for 1.13% of all variables). The results demonstrated that sCARS was a highly efficient approach for extracting information on wavenumbers and mitigating redundant wavenumbers. In addition, the current study indicated that MLR, which is simpler than PLSR, when combined with spectral variable selection, can achieve high-precision prediction of SOM and TN content. As such, DRIFT-MIR spectroscopy coupled with MLR and sCARS is a good alternative for estimating the SOM and TN of soils.https://www.mdpi.com/2073-4395/12/3/638precision agriculturemid-infrared soil spectroscopyspectral variable selectionmultiple linear regression
spellingShingle Hong Li
Junwei Wang
Jixiong Zhang
Tongqing Liu
Gifty E. Acquah
Huimin Yuan
Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
Agronomy
precision agriculture
mid-infrared soil spectroscopy
spectral variable selection
multiple linear regression
title Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
title_full Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
title_fullStr Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
title_full_unstemmed Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
title_short Combining Variable Selection and Multiple Linear Regression for Soil Organic Matter and Total Nitrogen Estimation by DRIFT-MIR Spectroscopy
title_sort combining variable selection and multiple linear regression for soil organic matter and total nitrogen estimation by drift mir spectroscopy
topic precision agriculture
mid-infrared soil spectroscopy
spectral variable selection
multiple linear regression
url https://www.mdpi.com/2073-4395/12/3/638
work_keys_str_mv AT hongli combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy
AT junweiwang combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy
AT jixiongzhang combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy
AT tongqingliu combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy
AT giftyeacquah combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy
AT huiminyuan combiningvariableselectionandmultiplelinearregressionforsoilorganicmatterandtotalnitrogenestimationbydriftmirspectroscopy