A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration

Variable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on informat...

Full description

Bibliographic Details
Main Authors: Liang Zhong, Ruiqi Huang, Lele Gao, Jianan Yue, Bing Zhao, Lei Nie, Lian Li, Aoli Wu, Kefan Zhang, Zhaoqing Meng, Guiyun Cao, Hui Zhang, Hengchang Zang
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/28/15/5672
_version_ 1797586375824900096
author Liang Zhong
Ruiqi Huang
Lele Gao
Jianan Yue
Bing Zhao
Lei Nie
Lian Li
Aoli Wu
Kefan Zhang
Zhaoqing Meng
Guiyun Cao
Hui Zhang
Hengchang Zang
author_facet Liang Zhong
Ruiqi Huang
Lele Gao
Jianan Yue
Bing Zhao
Lei Nie
Lian Li
Aoli Wu
Kefan Zhang
Zhaoqing Meng
Guiyun Cao
Hui Zhang
Hengchang Zang
author_sort Liang Zhong
collection DOAJ
description Variable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on information entropy theory. “Data binning” was applied to reduce the effects of minor measurement errors and increase the features of near-infrared spectra. “Normalized mutual information” was employed to calculate the correlation between each wavelength and the reference values. The performance of B-NMI was evaluated by two experimental datasets (ideal ternary solvent mixture dataset, fluidized bed granulation dataset) and two public datasets (gasoline octane dataset, corn protein dataset). Compared with classic methods of backward and interval PLS (BIPLS), variable importance projection (VIP), correlation coefficient (CC), uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS), B-NMI not only selected the most featured wavelengths from the spectra of complex real-world samples but also improved the stability and robustness of variable selection results.
first_indexed 2024-03-11T00:21:30Z
format Article
id doaj.art-599c669c36e8426f99f5bacf336b8ee5
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-11T00:21:30Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-599c669c36e8426f99f5bacf336b8ee52023-11-18T23:17:21ZengMDPI AGMolecules1420-30492023-07-012815567210.3390/molecules28155672A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate CalibrationLiang Zhong0Ruiqi Huang1Lele Gao2Jianan Yue3Bing Zhao4Lei Nie5Lian Li6Aoli Wu7Kefan Zhang8Zhaoqing Meng9Guiyun Cao10Hui Zhang11Hengchang Zang12NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaShandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, ChinaShandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaVariable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on information entropy theory. “Data binning” was applied to reduce the effects of minor measurement errors and increase the features of near-infrared spectra. “Normalized mutual information” was employed to calculate the correlation between each wavelength and the reference values. The performance of B-NMI was evaluated by two experimental datasets (ideal ternary solvent mixture dataset, fluidized bed granulation dataset) and two public datasets (gasoline octane dataset, corn protein dataset). Compared with classic methods of backward and interval PLS (BIPLS), variable importance projection (VIP), correlation coefficient (CC), uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS), B-NMI not only selected the most featured wavelengths from the spectra of complex real-world samples but also improved the stability and robustness of variable selection results.https://www.mdpi.com/1420-3049/28/15/5672variable selectionnear-infrared spectroscopydata binningnormalized mutual information
spellingShingle Liang Zhong
Ruiqi Huang
Lele Gao
Jianan Yue
Bing Zhao
Lei Nie
Lian Li
Aoli Wu
Kefan Zhang
Zhaoqing Meng
Guiyun Cao
Hui Zhang
Hengchang Zang
A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
Molecules
variable selection
near-infrared spectroscopy
data binning
normalized mutual information
title A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
title_full A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
title_fullStr A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
title_full_unstemmed A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
title_short A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
title_sort novel variable selection method based on binning normalized mutual information for multivariate calibration
topic variable selection
near-infrared spectroscopy
data binning
normalized mutual information
url https://www.mdpi.com/1420-3049/28/15/5672
work_keys_str_mv AT liangzhong anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT ruiqihuang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT lelegao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT jiananyue anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT bingzhao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT leinie anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT lianli anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT aoliwu anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT kefanzhang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT zhaoqingmeng anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT guiyuncao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT huizhang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT hengchangzang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT liangzhong novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT ruiqihuang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT lelegao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT jiananyue novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT bingzhao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT leinie novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT lianli novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT aoliwu novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT kefanzhang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT zhaoqingmeng novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT guiyuncao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT huizhang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration
AT hengchangzang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration