A review of data abstraction

It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing...

Full description

Bibliographic Details
Main Authors: Gianluca Cima, Marco Console, Maurizio Lenzerini, Antonella Poggi
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-06-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2023.1085754/full
_version_ 1827918208925958144
author Gianluca Cima
Marco Console
Maurizio Lenzerini
Antonella Poggi
author_facet Gianluca Cima
Marco Console
Maurizio Lenzerini
Antonella Poggi
author_sort Gianluca Cima
collection DOAJ
description It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research.
first_indexed 2024-03-13T03:39:04Z
format Article
id doaj.art-fcbb10d974a04fde8dc02531e5c1b855
institution Directory Open Access Journal
issn 2624-8212
language English
last_indexed 2024-03-13T03:39:04Z
publishDate 2023-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj.art-fcbb10d974a04fde8dc02531e5c1b8552023-06-23T12:48:08ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122023-06-01610.3389/frai.2023.10857541085754A review of data abstractionGianluca CimaMarco ConsoleMaurizio LenzeriniAntonella PoggiIt is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research.https://www.frontiersin.org/articles/10.3389/frai.2023.1085754/fullknowledge representationabstractionautomated reasoningdata integrationdata preparation
spellingShingle Gianluca Cima
Marco Console
Maurizio Lenzerini
Antonella Poggi
A review of data abstraction
Frontiers in Artificial Intelligence
knowledge representation
abstraction
automated reasoning
data integration
data preparation
title A review of data abstraction
title_full A review of data abstraction
title_fullStr A review of data abstraction
title_full_unstemmed A review of data abstraction
title_short A review of data abstraction
title_sort review of data abstraction
topic knowledge representation
abstraction
automated reasoning
data integration
data preparation
url https://www.frontiersin.org/articles/10.3389/frai.2023.1085754/full
work_keys_str_mv AT gianlucacima areviewofdataabstraction
AT marcoconsole areviewofdataabstraction
AT mauriziolenzerini areviewofdataabstraction
AT antonellapoggi areviewofdataabstraction
AT gianlucacima reviewofdataabstraction
AT marcoconsole reviewofdataabstraction
AT mauriziolenzerini reviewofdataabstraction
AT antonellapoggi reviewofdataabstraction