Comparison of Soil Total Nitrogen Content Prediction Models Based on Vis-NIR Spectroscopy

Visible-near-infrared spectrum (Vis-NIR) spectroscopy technology is one of the most important methods for non-destructive and rapid detection of soil total nitrogen (STN) content. In order to find a practical way to build STN content prediction model, three conventional machine learning methods and...

Full description

Bibliographic Details
Main Authors: Yueting Wang, Minzan Li, Ronghua Ji, Minjuan Wang, Lihua Zheng
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/24/7078
Description
Summary:Visible-near-infrared spectrum (Vis-NIR) spectroscopy technology is one of the most important methods for non-destructive and rapid detection of soil total nitrogen (STN) content. In order to find a practical way to build STN content prediction model, three conventional machine learning methods and one deep learning approach are investigated and their predictive performances are compared and analyzed by using a public dataset called LUCAS Soil (19,019 samples). The three conventional machine learning methods include ordinary least square estimation (OLSE), random forest (RF), and extreme learning machine (ELM), while for the deep learning method, three different structures of convolutional neural network (CNN) incorporated Inception module are constructed and investigated. In order to clarify effectiveness of different pre-treatments on predicting STN content, the three conventional machine learning methods are combined with four pre-processing approaches (including baseline correction, smoothing, dimensional reduction, and feature selection) are investigated, compared, and analyzed. The results indicate that the baseline-corrected and smoothed ELM model reaches practical precision (coefficient of determination (R<sup>2</sup>) = 0.89, root mean square error of prediction (RMSEP) = 1.60 g/kg, and residual prediction deviation (RPD) = 2.34). While among three different structured CNN models, the one with more 1 × 1 convolutions preforms better (R<sup>2</sup> = 0.93; RMSEP = 0.95 g/kg; and RPD = 3.85 in optimal case). In addition, in order to evaluate the influence of data set characteristics on the model, the LUCAS data set was divided into different data subsets according to dataset size, organic carbon (OC) content and countries, and the results show that the deep learning method is more effective and practical than conventional machine learning methods and, on the premise of enough data samples, it can be used to build a robust STN content prediction model with high accuracy for the same type of soil with similar agricultural treatment.
ISSN:1424-8220