Interval-based sparse ensemble multi-class classification algorithm for terahertz data
Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in T...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-03-01
|
Series: | Heliyon |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844024037745 |
_version_ | 1797224201474539520 |
---|---|
author | Chengyong Zheng Xiaowen Zha Shengjie Cai Jing Cui Qian Li Zhijing Ye |
author_facet | Chengyong Zheng Xiaowen Zha Shengjie Cai Jing Cui Qian Li Zhijing Ye |
author_sort | Chengyong Zheng |
collection | DOAJ |
description | Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in THz-based substance identification. However, most THz-based identification methods empirically intercept the low-frequency band of the THz absorption coefficients for analysis. In order to adaptively find out important intervals of the THz spectra, an interval-based sparse ensemble multi-class classifier (ISEMCC) for THz spectral data classification is proposed. In ISEMCC, the THz spectra are first divided into several small intervals through window sliding. Then the data of training samples in each interval are extracted to train some base classifiers. Finally, a final robust classifier is obtained through a nonnegative sparse combination of these trained base classifiers. With l1 -norm, two objective functions that based on Mean Square Error (MSE) and Cross Entropy (CE) are established. For these two objective functions, two iterative algorithms based on the Alternating Direction Method of Multipliers (ADMM) and Gradient Descent (GD) are built respectively. ISEMCC transforms the problem of interval feature selection and decision-level fusion into a nonnegative sparse optimization problem. The sparse constraint ensures only a few important spectral segments are selected. In order to verify the performance of the proposed algorithm, comparative experiments on identifying the origin of Bupleurum and the harvesting year of Tangerine peel are carried out. The base classifiers used by ISEMCC are Support Vector Machine (SVM) and Decision Tree (DT). The experimental results demonstrate that the proposed algorithm outperforms six typical classifiers, including Random Forest (RF), AdaBoost, RUSBoost, ExtraTree, and the two base classifiers, in terms of classification accuracy. |
first_indexed | 2024-04-24T13:49:21Z |
format | Article |
id | doaj.art-6c86b3059f234cd0baf3f467951336df |
institution | Directory Open Access Journal |
issn | 2405-8440 |
language | English |
last_indexed | 2024-04-24T13:49:21Z |
publishDate | 2024-03-01 |
publisher | Elsevier |
record_format | Article |
series | Heliyon |
spelling | doaj.art-6c86b3059f234cd0baf3f467951336df2024-04-04T05:05:46ZengElsevierHeliyon2405-84402024-03-01106e27743Interval-based sparse ensemble multi-class classification algorithm for terahertz dataChengyong Zheng0Xiaowen Zha1Shengjie Cai2Jing Cui3Qian Li4Zhijing Ye5School of Mathematics and Computational Science, Wuyi University, Jiangmen, 529000, ChinaSchool of Mathematics and Computational Science, Wuyi University, Jiangmen, 529000, ChinaShenzhen Kangguan Technology Co., LTD, Shenzhen, 518129, ChinaGuangdong Jiangmen Chinese Traditional Medicine College, Jiangmen, 529020, China; Corresponding authors.Terahertz Technology Application (Guangdong) Co., Ltd, Guangzhou, 510700, ChinaFaculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau; Corresponding authors.Terahertz time-domain spectroscopy (THz-TDS) has been widely used for food and drug identification. The classification information of a THz spectrum usually does not exist in the whole spectral band but exists only in one or several small intervals. Therefore, feature selection is indispensable in THz-based substance identification. However, most THz-based identification methods empirically intercept the low-frequency band of the THz absorption coefficients for analysis. In order to adaptively find out important intervals of the THz spectra, an interval-based sparse ensemble multi-class classifier (ISEMCC) for THz spectral data classification is proposed. In ISEMCC, the THz spectra are first divided into several small intervals through window sliding. Then the data of training samples in each interval are extracted to train some base classifiers. Finally, a final robust classifier is obtained through a nonnegative sparse combination of these trained base classifiers. With l1 -norm, two objective functions that based on Mean Square Error (MSE) and Cross Entropy (CE) are established. For these two objective functions, two iterative algorithms based on the Alternating Direction Method of Multipliers (ADMM) and Gradient Descent (GD) are built respectively. ISEMCC transforms the problem of interval feature selection and decision-level fusion into a nonnegative sparse optimization problem. The sparse constraint ensures only a few important spectral segments are selected. In order to verify the performance of the proposed algorithm, comparative experiments on identifying the origin of Bupleurum and the harvesting year of Tangerine peel are carried out. The base classifiers used by ISEMCC are Support Vector Machine (SVM) and Decision Tree (DT). The experimental results demonstrate that the proposed algorithm outperforms six typical classifiers, including Random Forest (RF), AdaBoost, RUSBoost, ExtraTree, and the two base classifiers, in terms of classification accuracy.http://www.sciencedirect.com/science/article/pii/S2405844024037745Terahertz spectrumClassificationSparse ensembleIntervalCross entropy |
spellingShingle | Chengyong Zheng Xiaowen Zha Shengjie Cai Jing Cui Qian Li Zhijing Ye Interval-based sparse ensemble multi-class classification algorithm for terahertz data Heliyon Terahertz spectrum Classification Sparse ensemble Interval Cross entropy |
title | Interval-based sparse ensemble multi-class classification algorithm for terahertz data |
title_full | Interval-based sparse ensemble multi-class classification algorithm for terahertz data |
title_fullStr | Interval-based sparse ensemble multi-class classification algorithm for terahertz data |
title_full_unstemmed | Interval-based sparse ensemble multi-class classification algorithm for terahertz data |
title_short | Interval-based sparse ensemble multi-class classification algorithm for terahertz data |
title_sort | interval based sparse ensemble multi class classification algorithm for terahertz data |
topic | Terahertz spectrum Classification Sparse ensemble Interval Cross entropy |
url | http://www.sciencedirect.com/science/article/pii/S2405844024037745 |
work_keys_str_mv | AT chengyongzheng intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata AT xiaowenzha intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata AT shengjiecai intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata AT jingcui intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata AT qianli intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata AT zhijingye intervalbasedsparseensemblemulticlassclassificationalgorithmforterahertzdata |