Interval-based sparse ensemble multi-class classification algorithm for terahertz data

Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in T...

Full description

Bibliographic Details
Main Authors: Chengyong Zheng, Xiaowen Zha, Shengjie Cai, Jing Cui, Qian Li, Zhijing Ye
Format: Article
Language:English
Published: Elsevier 2024-03-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844024037745
_version_ 1797224201474539520
author Chengyong Zheng
Xiaowen Zha
Shengjie Cai
Jing Cui
Qian Li
Zhijing Ye
author_facet Chengyong Zheng
Xiaowen Zha
Shengjie Cai
Jing Cui
Qian Li
Zhijing Ye
author_sort Chengyong Zheng
collection DOAJ
description Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in THz-based substance identification. However, most THz-based identification methods empirically intercept the low-frequency band of the THz absorption coefficients for analysis. In order to adaptively find out important intervals of the THz spectra, an interval-based sparse ensemble multi-class classifier (ISEMCC) for THz spectral data classification is proposed. In ISEMCC, the THz spectra are first divided into several small intervals through window sliding. Then the data of training samples in each interval are extracted to train some base classifiers. Finally, a final robust classifier is obtained through a nonnegative sparse combination of these trained base classifiers. With l1 -norm, two objective functions that based on Mean Square Error (MSE) and Cross Entropy (CE) are established. For these two objective functions, two iterative algorithms based on the Alternating Direction Method of Multipliers (ADMM) and Gradient Descent (GD) are built respectively. ISEMCC transforms the problem of interval feature selection and decision-level fusion into a nonnegative sparse optimization problem. The sparse constraint ensures only a few important spectral segments are selected. In order to verify the performance of the proposed algorithm, comparative experiments on identifying the origin of Bupleurum and the harvesting year of Tangerine peel are carried out. The base classifiers used by ISEMCC are Support Vector Machine (SVM) and Decision Tree (DT). The experimental results demonstrate that the proposed algorithm outperforms six typical classifiers, including Random Forest (RF), AdaBoost, RUSBoost, ExtraTree, and the two base classifiers, in terms of classification accuracy.
first_indexed 2024-04-24T13:49:21Z
format Article
id doaj.art-6c86b3059f234cd0baf3f467951336df
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-04-24T13:49:21Z
publishDate 2024-03-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-6c86b3059f234cd0baf3f467951336df2024-04-04T05:05:46ZengElsevierHeliyon2405-84402024-03-01106e27743Interval-based sparse ensemble multi-class classification algorithm for terahertz dataChengyong Zheng0Xiaowen Zha1Shengjie Cai2Jing Cui3Qian Li4Zhijing Ye5School of Mathematics and Computational Science, Wuyi University, Jiangmen, 529000, ChinaSchool of Mathematics and Computational Science, Wuyi University, Jiangmen, 529000, ChinaShenzhen Kangguan Technology Co., LTD, Shenzhen, 518129, ChinaGuangdong Jiangmen Chinese Traditional Medicine College, Jiangmen, 529020, China; Corresponding authors.Terahertz Technology Application (Guangdong) Co., Ltd, Guangzhou, 510700, ChinaFaculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau; Corresponding authors.Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in THz-based substance identification. However, most THz-based identification methods empirically intercept the low-frequency band of the THz absorption coefficients for analysis. In order to adaptively find out important intervals of the THz spectra, an interval-based sparse ensemble multi-class classifier (ISEMCC) for THz spectral data classification is proposed. In ISEMCC, the THz spectra are first divided into several small intervals through window sliding. Then the data of training samples in each interval are extracted to train some base classifiers. Finally, a final robust classifier is obtained through a nonnegative sparse combination of these trained base classifiers. With l1 -norm, two objective functions that based on Mean Square Error (MSE) and Cross Entropy (CE) are established. For these two objective functions, two iterative algorithms based on the Alternating Direction Method of Multipliers (ADMM) and Gradient Descent (GD) are built respectively. ISEMCC transforms the problem of interval feature selection and decision-level fusion into a nonnegative sparse optimization problem. The sparse constraint ensures only a few important spectral segments are selected. In order to verify the performance of the proposed algorithm, comparative experiments on identifying the origin of Bupleurum and the harvesting year of Tangerine peel are carried out. The base classifiers used by ISEMCC are Support Vector Machine (SVM) and Decision Tree (DT). The experimental results demonstrate that the proposed algorithm outperforms six typical classifiers, including Random Forest (RF), AdaBoost, RUSBoost, ExtraTree, and the two base classifiers, in terms of classification accuracy.http://www.sciencedirect.com/science/article/pii/S2405844024037745Terahertz spectrumClassificationSparse ensembleIntervalCross entropy
spellingShingle Chengyong Zheng
Xiaowen Zha
Shengjie Cai
Jing Cui
Qian Li
Zhijing Ye
Interval-based sparse ensemble multi-class classification algorithm for terahertz data
Heliyon
Terahertz spectrum
Classification
Sparse ensemble
Interval
Cross entropy
title Interval-based sparse ensemble multi-class classification algorithm for terahertz data
title_full Interval-based sparse ensemble multi-class classification algorithm for terahertz data
title_fullStr Interval-based sparse ensemble multi-class classification algorithm for terahertz data
title_full_unstemmed Interval-based sparse ensemble multi-class classification algorithm for terahertz data
title_short Interval-based sparse ensemble multi-class classification algorithm for terahertz data
title_sort interval based sparse ensemble multi class classification algorithm for terahertz data
topic Terahertz spectrum
Classification
Sparse ensemble
Interval
Cross entropy
url http://www.sciencedirect.com/science/article/pii/S2405844024037745
work_keys_str_mv AT chengyongzheng intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata
AT xiaowenzha intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata
AT shengjiecai intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata
AT jingcui intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata
AT qianli intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata
AT zhijingye intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata