Improving Pipelining Tools for Pre-processing Data

The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better und...

Full description

Bibliographic Details
Main Authors:	María Novo-Lourés, Yeray Lage, Reyes Pavón, Rosalía Laza, David Ruano-Ordás, José Ramón Méndez
Format:	Article
Language:	English
Published:	Universidad Internacional de La Rioja (UNIR) 2022-06-01
Series:	International Journal of Interactive Multimedia and Artificial Intelligence
Subjects:	burst processing data pre-processing java pipeline frameworks
Online Access:	https://www.ijimai.org/journal/bibcite/reference/3028

_version_	1818545900780257280
author	María Novo-Lourés Yeray Lage Reyes Pavón Rosalía Laza David Ruano-Ordás José Ramón Méndez
author_facet	María Novo-Lourés Yeray Lage Reyes Pavón Rosalía Laza David Ruano-Ordás José Ramón Méndez
author_sort	María Novo-Lourés
collection	DOAJ
description	The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
first_indexed	2024-12-12T07:46:02Z
format	Article
id	doaj.art-a23a7d57de3d4e9ea480da484ff7e55f
institution	Directory Open Access Journal
issn	1989-1660
language	English
last_indexed	2024-12-12T07:46:02Z
publishDate	2022-06-01
publisher	Universidad Internacional de La Rioja (UNIR)
record_format	Article
series	International Journal of Interactive Multimedia and Artificial Intelligence
spelling	doaj.art-a23a7d57de3d4e9ea480da484ff7e55f2022-12-22T00:32:36ZengUniversidad Internacional de La Rioja (UNIR)International Journal of Interactive Multimedia and Artificial Intelligence1989-16602022-06-017421422410.9781/ijimai.2021.10.004ijimai.2021.10.004Improving Pipelining Tools for Pre-processing DataMaría Novo-LourésYeray LageReyes PavónRosalía LazaDavid Ruano-OrdásJosé Ramón MéndezThe last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.https://www.ijimai.org/journal/bibcite/reference/3028burst processingdata pre-processingjavapipeline frameworks
spellingShingle	María Novo-Lourés Yeray Lage Reyes Pavón Rosalía Laza David Ruano-Ordás José Ramón Méndez Improving Pipelining Tools for Pre-processing Data International Journal of Interactive Multimedia and Artificial Intelligence burst processing data pre-processing java pipeline frameworks
title	Improving Pipelining Tools for Pre-processing Data
title_full	Improving Pipelining Tools for Pre-processing Data
title_fullStr	Improving Pipelining Tools for Pre-processing Data
title_full_unstemmed	Improving Pipelining Tools for Pre-processing Data
title_short	Improving Pipelining Tools for Pre-processing Data
title_sort	improving pipelining tools for pre processing data
topic	burst processing data pre-processing java pipeline frameworks
url	https://www.ijimai.org/journal/bibcite/reference/3028
work_keys_str_mv	AT marianovoloures improvingpipeliningtoolsforpreprocessingdata AT yeraylage improvingpipeliningtoolsforpreprocessingdata AT reyespavon improvingpipeliningtoolsforpreprocessingdata AT rosalialaza improvingpipeliningtoolsforpreprocessingdata AT davidruanoordas improvingpipeliningtoolsforpreprocessingdata AT joseramonmendez improvingpipeliningtoolsforpreprocessingdata

Improving Pipelining Tools for Pre-processing Data

Similar Items