Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
The Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ens...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/22/4602 |
_version_ | 1827639404856868864 |
---|---|
author | Minh Hung Ho Amélie Ponchet Durupt Hai Canh Vu Nassim Boudaoud Arnaud Caracciolo Sophie Sieg-Zieba Yun Xu Patrick Leduc |
author_facet | Minh Hung Ho Amélie Ponchet Durupt Hai Canh Vu Nassim Boudaoud Arnaud Caracciolo Sophie Sieg-Zieba Yun Xu Patrick Leduc |
author_sort | Minh Hung Ho |
collection | DOAJ |
description | The Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ensure that the results of data analysis and modelling are accurate, reliable, and reflect the real phenomena being studied. Common problems encountered with real industrial databases are missing data, outliers, anomalies, unbalanced classes, and non-exhaustive historical data. Unlike papers present in the literature that respond to those problems in a dissociated way, the work performed in this article aims to address all these problems at once. A comprehensive framework for data flow encompassing data acquisition, preprocessing, and machine class classification is proposed. The challenges of missing data, outliers, and anomalies are addressed with critical and novel class outliers distinguished. The study also tackles unbalanced class classification and evaluates the impact of missing data on classification accuracy. Several machine learning models for the operating state classification are implemented. The study also compares the performance of the proposed framework with two existing methods: the Histogram Gradient Boosting Classifier and the Extreme Gradient Boosting classifier. It is shown that using “hard voting” ensemble learning methods to combine several classifiers makes the final classifier more robust to missing data. An application is carried out on data from a real industrial dataset. This research contributes to narrowing the theory–practice gap in leveraging IIoT technologies, offering practical insights into data analytics implementation in real industrial scenarios. |
first_indexed | 2024-03-09T16:38:01Z |
format | Article |
id | doaj.art-43c4025cecdb435eba245ff31ee53a0e |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-09T16:38:01Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-43c4025cecdb435eba245ff31ee53a0e2023-11-24T14:54:10ZengMDPI AGMathematics2227-73902023-11-011122460210.3390/math11224602Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool ProductionMinh Hung Ho0Amélie Ponchet Durupt1Hai Canh Vu2Nassim Boudaoud3Arnaud Caracciolo4Sophie Sieg-Zieba5Yun Xu6Patrick Leduc7Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceCentre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, FranceCentre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, FranceALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, FranceALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, FranceThe Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ensure that the results of data analysis and modelling are accurate, reliable, and reflect the real phenomena being studied. Common problems encountered with real industrial databases are missing data, outliers, anomalies, unbalanced classes, and non-exhaustive historical data. Unlike papers present in the literature that respond to those problems in a dissociated way, the work performed in this article aims to address all these problems at once. A comprehensive framework for data flow encompassing data acquisition, preprocessing, and machine class classification is proposed. The challenges of missing data, outliers, and anomalies are addressed with critical and novel class outliers distinguished. The study also tackles unbalanced class classification and evaluates the impact of missing data on classification accuracy. Several machine learning models for the operating state classification are implemented. The study also compares the performance of the proposed framework with two existing methods: the Histogram Gradient Boosting Classifier and the Extreme Gradient Boosting classifier. It is shown that using “hard voting” ensemble learning methods to combine several classifiers makes the final classifier more robust to missing data. An application is carried out on data from a real industrial dataset. This research contributes to narrowing the theory–practice gap in leveraging IIoT technologies, offering practical insights into data analytics implementation in real industrial scenarios.https://www.mdpi.com/2227-7390/11/22/4602Industrial Internet of Thingsmissing dataimputation methodsimbalanced classclassification performance |
spellingShingle | Minh Hung Ho Amélie Ponchet Durupt Hai Canh Vu Nassim Boudaoud Arnaud Caracciolo Sophie Sieg-Zieba Yun Xu Patrick Leduc Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production Mathematics Industrial Internet of Things missing data imputation methods imbalanced class classification performance |
title | Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production |
title_full | Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production |
title_fullStr | Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production |
title_full_unstemmed | Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production |
title_short | Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production |
title_sort | ensemble learning for multi label classification with unbalanced classes a case study of a curing oven in glass wool production |
topic | Industrial Internet of Things missing data imputation methods imbalanced class classification performance |
url | https://www.mdpi.com/2227-7390/11/22/4602 |
work_keys_str_mv | AT minhhungho ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT amelieponchetdurupt ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT haicanhvu ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT nassimboudaoud ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT arnaudcaracciolo ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT sophiesiegzieba ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT yunxu ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction AT patrickleduc ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction |