Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production

The Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ens...

Full description

Bibliographic Details
Main Authors: Minh Hung Ho, Amélie Ponchet Durupt, Hai Canh Vu, Nassim Boudaoud, Arnaud Caracciolo, Sophie Sieg-Zieba, Yun Xu, Patrick Leduc
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/22/4602
_version_ 1827639404856868864
author Minh Hung Ho
Amélie Ponchet Durupt
Hai Canh Vu
Nassim Boudaoud
Arnaud Caracciolo
Sophie Sieg-Zieba
Yun Xu
Patrick Leduc
author_facet Minh Hung Ho
Amélie Ponchet Durupt
Hai Canh Vu
Nassim Boudaoud
Arnaud Caracciolo
Sophie Sieg-Zieba
Yun Xu
Patrick Leduc
author_sort Minh Hung Ho
collection DOAJ
description The Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ensure that the results of data analysis and modelling are accurate, reliable, and reflect the real phenomena being studied. Common problems encountered with real industrial databases are missing data, outliers, anomalies, unbalanced classes, and non-exhaustive historical data. Unlike papers present in the literature that respond to those problems in a dissociated way, the work performed in this article aims to address all these problems at once. A comprehensive framework for data flow encompassing data acquisition, preprocessing, and machine class classification is proposed. The challenges of missing data, outliers, and anomalies are addressed with critical and novel class outliers distinguished. The study also tackles unbalanced class classification and evaluates the impact of missing data on classification accuracy. Several machine learning models for the operating state classification are implemented. The study also compares the performance of the proposed framework with two existing methods: the Histogram Gradient Boosting Classifier and the Extreme Gradient Boosting classifier. It is shown that using “hard voting” ensemble learning methods to combine several classifiers makes the final classifier more robust to missing data. An application is carried out on data from a real industrial dataset. This research contributes to narrowing the theory–practice gap in leveraging IIoT technologies, offering practical insights into data analytics implementation in real industrial scenarios.
first_indexed 2024-03-09T16:38:01Z
format Article
id doaj.art-43c4025cecdb435eba245ff31ee53a0e
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T16:38:01Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-43c4025cecdb435eba245ff31ee53a0e2023-11-24T14:54:10ZengMDPI AGMathematics2227-73902023-11-011122460210.3390/math11224602Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool ProductionMinh Hung Ho0Amélie Ponchet Durupt1Hai Canh Vu2Nassim Boudaoud3Arnaud Caracciolo4Sophie Sieg-Zieba5Yun Xu6Patrick Leduc7Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceUniversité de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, FranceCentre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, FranceCentre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, FranceALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, FranceALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, FranceThe Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ensure that the results of data analysis and modelling are accurate, reliable, and reflect the real phenomena being studied. Common problems encountered with real industrial databases are missing data, outliers, anomalies, unbalanced classes, and non-exhaustive historical data. Unlike papers present in the literature that respond to those problems in a dissociated way, the work performed in this article aims to address all these problems at once. A comprehensive framework for data flow encompassing data acquisition, preprocessing, and machine class classification is proposed. The challenges of missing data, outliers, and anomalies are addressed with critical and novel class outliers distinguished. The study also tackles unbalanced class classification and evaluates the impact of missing data on classification accuracy. Several machine learning models for the operating state classification are implemented. The study also compares the performance of the proposed framework with two existing methods: the Histogram Gradient Boosting Classifier and the Extreme Gradient Boosting classifier. It is shown that using “hard voting” ensemble learning methods to combine several classifiers makes the final classifier more robust to missing data. An application is carried out on data from a real industrial dataset. This research contributes to narrowing the theory–practice gap in leveraging IIoT technologies, offering practical insights into data analytics implementation in real industrial scenarios.https://www.mdpi.com/2227-7390/11/22/4602Industrial Internet of Thingsmissing dataimputation methodsimbalanced classclassification performance
spellingShingle Minh Hung Ho
Amélie Ponchet Durupt
Hai Canh Vu
Nassim Boudaoud
Arnaud Caracciolo
Sophie Sieg-Zieba
Yun Xu
Patrick Leduc
Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
Mathematics
Industrial Internet of Things
missing data
imputation methods
imbalanced class
classification performance
title Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
title_full Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
title_fullStr Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
title_full_unstemmed Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
title_short Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production
title_sort ensemble learning for multi label classification with unbalanced classes a case study of a curing oven in glass wool production
topic Industrial Internet of Things
missing data
imputation methods
imbalanced class
classification performance
url https://www.mdpi.com/2227-7390/11/22/4602
work_keys_str_mv AT minhhungho ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT amelieponchetdurupt ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT haicanhvu ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT nassimboudaoud ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT arnaudcaracciolo ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT sophiesiegzieba ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT yunxu ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction
AT patrickleduc ensemblelearningformultilabelclassificationwithunbalancedclassesacasestudyofacuringoveninglasswoolproduction