A Generalized Framework for Adopting Regression-Based Predictive Modeling in Manufacturing Environments

In this paper, the growing significance of data analysis in manufacturing environments is exemplified through a review of relevant literature and a generic framework to aid the ease of adoption of regression-based supervised learning in manufacturing environments. To validate the practicality of the...

Full description

Bibliographic Details
Main Authors: Mobayode O. Akinsolu, Khalil Zribi
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Inventions
Subjects:
Online Access:https://www.mdpi.com/2411-5134/8/1/32
Description
Summary:In this paper, the growing significance of data analysis in manufacturing environments is exemplified through a review of relevant literature and a generic framework to aid the ease of adoption of regression-based supervised learning in manufacturing environments. To validate the practicality of the framework, several regression learning techniques are applied to an open-source multi-stage continuous-flow manufacturing process data set to typify inference-driven decision-making that informs the selection of regression learning methods for adoption in real-world manufacturing environments. The investigated regression learning techniques are evaluated in terms of their training time, prediction speed, predictive accuracy (R-squared value), and mean squared error. In terms of training time (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>T</mi><mi>T</mi></mrow></semantics></math></inline-formula>), <i>k</i>-NN20 (<i>k</i>-Nearest Neighbour with 20 neighbors) ranks first with average and median values of 4.8 ms and 4.9 ms, and 4.2 ms and 4.3 ms, respectively, for the first stage and second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, respectively, over 50 independent runs. In terms of prediction speed (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><mi>S</mi></mrow></semantics></math></inline-formula>), DTR (decision tree regressor) ranks first with average and median values of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>5.6784</mn><mo>×</mo><msup><mn>10</mn><mn>6</mn></msup></mrow></semantics></math></inline-formula> observations per second (ob/s) and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4.8691</mn><mo>×</mo><msup><mn>10</mn><mn>6</mn></msup></mrow></semantics></math></inline-formula> observations per second (ob/s), and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4.9929</mn><mo>×</mo><msup><mn>10</mn><mn>6</mn></msup></mrow></semantics></math></inline-formula> observations per second (ob/s) and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>5.8806</mn><mo>×</mo><msup><mn>10</mn><mn>6</mn></msup></mrow></semantics></math></inline-formula> observations per second (ob/s), respectively, for the first stage and second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, respectively, over 50 independent runs. In terms of R-squared value (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>R</mi><mn>2</mn></msup></semantics></math></inline-formula>), BR (bagging regressor) ranks first with average and median values of 0.728 and 0.728, respectively, over 50 independent runs, for the first stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, and RFR (random forest regressor) ranks first with average and median values of 0.746 and 0.746, respectively, over 50 independent runs, for the second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process. In terms of mean squared error (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>M</mi><mi>S</mi><mi>E</mi></mrow></semantics></math></inline-formula>), BR (bagging regressor) ranks first with average and median values of 2.7 and 2.7, respectively, over 50 independent runs, for the first stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, and RFR (random forest regressor) ranks first with average and median values of 3.5 and 3.5, respectively, over 50 independent runs, for the second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process. All methods are further ranked inferentially using the statistics of their performance metrics to identify the best method(s) for the first and second stages of the predictive modeling of the multi-stage continuous-flow manufacturing process. A Wilcoxon rank sum test is then used to statistically verify the inference-based rankings. <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>D</mi><mi>T</mi><mi>R</mi></mrow></semantics></math></inline-formula> and <i>k</i>-NN20 have been identified as the most suitable regression learning techniques given the multi-stage continuous-flow manufacturing process data used for experimentation.
ISSN:2411-5134