A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
Variable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on informat...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-07-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/28/15/5672 |
_version_ | 1797586375824900096 |
---|---|
author | Liang Zhong Ruiqi Huang Lele Gao Jianan Yue Bing Zhao Lei Nie Lian Li Aoli Wu Kefan Zhang Zhaoqing Meng Guiyun Cao Hui Zhang Hengchang Zang |
author_facet | Liang Zhong Ruiqi Huang Lele Gao Jianan Yue Bing Zhao Lei Nie Lian Li Aoli Wu Kefan Zhang Zhaoqing Meng Guiyun Cao Hui Zhang Hengchang Zang |
author_sort | Liang Zhong |
collection | DOAJ |
description | Variable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on information entropy theory. “Data binning” was applied to reduce the effects of minor measurement errors and increase the features of near-infrared spectra. “Normalized mutual information” was employed to calculate the correlation between each wavelength and the reference values. The performance of B-NMI was evaluated by two experimental datasets (ideal ternary solvent mixture dataset, fluidized bed granulation dataset) and two public datasets (gasoline octane dataset, corn protein dataset). Compared with classic methods of backward and interval PLS (BIPLS), variable importance projection (VIP), correlation coefficient (CC), uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS), B-NMI not only selected the most featured wavelengths from the spectra of complex real-world samples but also improved the stability and robustness of variable selection results. |
first_indexed | 2024-03-11T00:21:30Z |
format | Article |
id | doaj.art-599c669c36e8426f99f5bacf336b8ee5 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-11T00:21:30Z |
publishDate | 2023-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-599c669c36e8426f99f5bacf336b8ee52023-11-18T23:17:21ZengMDPI AGMolecules1420-30492023-07-012815567210.3390/molecules28155672A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate CalibrationLiang Zhong0Ruiqi Huang1Lele Gao2Jianan Yue3Bing Zhao4Lei Nie5Lian Li6Aoli Wu7Kefan Zhang8Zhaoqing Meng9Guiyun Cao10Hui Zhang11Hengchang Zang12NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaShandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, ChinaShandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaNMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, ChinaVariable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on information entropy theory. “Data binning” was applied to reduce the effects of minor measurement errors and increase the features of near-infrared spectra. “Normalized mutual information” was employed to calculate the correlation between each wavelength and the reference values. The performance of B-NMI was evaluated by two experimental datasets (ideal ternary solvent mixture dataset, fluidized bed granulation dataset) and two public datasets (gasoline octane dataset, corn protein dataset). Compared with classic methods of backward and interval PLS (BIPLS), variable importance projection (VIP), correlation coefficient (CC), uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS), B-NMI not only selected the most featured wavelengths from the spectra of complex real-world samples but also improved the stability and robustness of variable selection results.https://www.mdpi.com/1420-3049/28/15/5672variable selectionnear-infrared spectroscopydata binningnormalized mutual information |
spellingShingle | Liang Zhong Ruiqi Huang Lele Gao Jianan Yue Bing Zhao Lei Nie Lian Li Aoli Wu Kefan Zhang Zhaoqing Meng Guiyun Cao Hui Zhang Hengchang Zang A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration Molecules variable selection near-infrared spectroscopy data binning normalized mutual information |
title | A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration |
title_full | A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration |
title_fullStr | A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration |
title_full_unstemmed | A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration |
title_short | A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration |
title_sort | novel variable selection method based on binning normalized mutual information for multivariate calibration |
topic | variable selection near-infrared spectroscopy data binning normalized mutual information |
url | https://www.mdpi.com/1420-3049/28/15/5672 |
work_keys_str_mv | AT liangzhong anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT ruiqihuang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT lelegao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT jiananyue anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT bingzhao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT leinie anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT lianli anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT aoliwu anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT kefanzhang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT zhaoqingmeng anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT guiyuncao anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT huizhang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT hengchangzang anovelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT liangzhong novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT ruiqihuang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT lelegao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT jiananyue novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT bingzhao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT leinie novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT lianli novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT aoliwu novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT kefanzhang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT zhaoqingmeng novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT guiyuncao novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT huizhang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration AT hengchangzang novelvariableselectionmethodbasedonbinningnormalizedmutualinformationformultivariatecalibration |