Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques

Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for ea...

Full description

Bibliographic Details
Main Authors: Luis A. Briceno-Mena, Miriam Nnadili, Michael G. Benton, Jose A. Romagnoli
Format: Article
Language:English
Published: Cambridge University Press 2022-01-01
Series:Data-Centric Engineering
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2632673622000211/type/journal_article
_version_ 1811156423967506432
author Luis A. Briceno-Mena
Miriam Nnadili
Michael G. Benton
Jose A. Romagnoli
author_facet Luis A. Briceno-Mena
Miriam Nnadili
Michael G. Benton
Jose A. Romagnoli
author_sort Luis A. Briceno-Mena
collection DOAJ
description Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.
first_indexed 2024-04-10T04:51:20Z
format Article
id doaj.art-3fd963e292fe42ffafcb0a09f115814f
institution Directory Open Access Journal
issn 2632-6736
language English
last_indexed 2024-04-10T04:51:20Z
publishDate 2022-01-01
publisher Cambridge University Press
record_format Article
series Data-Centric Engineering
spelling doaj.art-3fd963e292fe42ffafcb0a09f115814f2023-03-09T12:31:51ZengCambridge University PressData-Centric Engineering2632-67362022-01-01310.1017/dce.2022.21Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniquesLuis A. Briceno-Mena0https://orcid.org/0000-0003-3684-4232Miriam Nnadili1Michael G. Benton2Jose A. Romagnoli3Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USACain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USACain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USACain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USAData mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.https://www.cambridge.org/core/product/identifier/S2632673622000211/type/journal_articleData miningknowledge discoverymachine learningprocess monitoringunsupervised learning
spellingShingle Luis A. Briceno-Mena
Miriam Nnadili
Michael G. Benton
Jose A. Romagnoli
Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
Data-Centric Engineering
Data mining
knowledge discovery
machine learning
process monitoring
unsupervised learning
title Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
title_full Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
title_fullStr Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
title_full_unstemmed Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
title_short Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques
title_sort data mining and knowledge discovery in chemical processes effect of alternative processing techniques
topic Data mining
knowledge discovery
machine learning
process monitoring
unsupervised learning
url https://www.cambridge.org/core/product/identifier/S2632673622000211/type/journal_article
work_keys_str_mv AT luisabricenomena dataminingandknowledgediscoveryinchemicalprocesseseffectofalternativeprocessingtechniques
AT miriamnnadili dataminingandknowledgediscoveryinchemicalprocesseseffectofalternativeprocessingtechniques
AT michaelgbenton dataminingandknowledgediscoveryinchemicalprocesseseffectofalternativeprocessingtechniques
AT josearomagnoli dataminingandknowledgediscoveryinchemicalprocesseseffectofalternativeprocessingtechniques