A Column Styled Composable Schema Matcher for Semantic Data-Types

Schema matching exists as a long-standing challenge in many database related applications, such as data integration, where two databases with different schema have to be integrated. With the evolvement from database to big data, the schema matching has been enriched with various purposes and applica...

Full description

Bibliographic Details
Main Authors: Xiaofeng Liao, Jordy Bottelier, Zhiming Zhao
Format: Article
Language:English
Published: Ubiquity Press 2019-06-01
Series:Data Science Journal
Subjects:
Online Access:https://datascience.codata.org/articles/973
_version_ 1818175774992105472
author Xiaofeng Liao
Jordy Bottelier
Zhiming Zhao
author_facet Xiaofeng Liao
Jordy Bottelier
Zhiming Zhao
author_sort Xiaofeng Liao
collection DOAJ
description Schema matching exists as a long-standing challenge in many database related applications, such as data integration, where two databases with different schema have to be integrated. With the evolvement from database to big data, the schema matching has been enriched with various purposes and application contexts, ranging from data integration, to service integration, to semantic data clouding, until more recent exploratory data analysis over big data. These enriched contexts increase the demand for schema matching between semantic data-types, such as XML, RDF etc. The existing integration approaches have not dealt with the challenges of defining a relation between XML and other semantic data-types. To address these challenges, this paper studies the problem of schema mapping from XML to RDF in two folds. Firstly, testify the validity of single matcher in a column based manner for the semantic data types. Secondly, testify the validity of a highly configurable framework that utilizes hierarchical classification in order to construct a composable pipeline. We propose and implement a Reconfigurable pipeline for Semi-Automatic Schema Matching (REPSASM), which aims to solve the customizability of the matching problem by providing an environment in which a user can create, configure and experiment with their own schema-matching procedure. The experiments performed within this work show that the configurability and hierarchical classification improves the matching result, and it proposes an algorithm to automatically optimize such a hierarchy pipeline.
first_indexed 2024-12-11T20:05:39Z
format Article
id doaj.art-41cda60a24944021be704eab1a30d8a3
institution Directory Open Access Journal
issn 1683-1470
language English
last_indexed 2024-12-11T20:05:39Z
publishDate 2019-06-01
publisher Ubiquity Press
record_format Article
series Data Science Journal
spelling doaj.art-41cda60a24944021be704eab1a30d8a32022-12-22T00:52:24ZengUbiquity PressData Science Journal1683-14702019-06-0118110.5334/dsj-2019-025717A Column Styled Composable Schema Matcher for Semantic Data-TypesXiaofeng Liao0Jordy Bottelier1Zhiming Zhao2System and Network Engineering Lab, Informatics Institute, University of Amsterdam, AmsterdamSystem and Network Engineering Lab, Informatics Institute, University of Amsterdam, AmsterdamSystem and Network Engineering Lab, Informatics Institute, University of Amsterdam, AmsterdamSchema matching exists as a long-standing challenge in many database related applications, such as data integration, where two databases with different schema have to be integrated. With the evolvement from database to big data, the schema matching has been enriched with various purposes and application contexts, ranging from data integration, to service integration, to semantic data clouding, until more recent exploratory data analysis over big data. These enriched contexts increase the demand for schema matching between semantic data-types, such as XML, RDF etc. The existing integration approaches have not dealt with the challenges of defining a relation between XML and other semantic data-types. To address these challenges, this paper studies the problem of schema mapping from XML to RDF in two folds. Firstly, testify the validity of single matcher in a column based manner for the semantic data types. Secondly, testify the validity of a highly configurable framework that utilizes hierarchical classification in order to construct a composable pipeline. We propose and implement a Reconfigurable pipeline for Semi-Automatic Schema Matching (REPSASM), which aims to solve the customizability of the matching problem by providing an environment in which a user can create, configure and experiment with their own schema-matching procedure. The experiments performed within this work show that the configurability and hierarchical classification improves the matching result, and it proposes an algorithm to automatically optimize such a hierarchy pipeline.https://datascience.codata.org/articles/973Schema MatchingSemantic Data-typesXMLRDF
spellingShingle Xiaofeng Liao
Jordy Bottelier
Zhiming Zhao
A Column Styled Composable Schema Matcher for Semantic Data-Types
Data Science Journal
Schema Matching
Semantic Data-types
XML
RDF
title A Column Styled Composable Schema Matcher for Semantic Data-Types
title_full A Column Styled Composable Schema Matcher for Semantic Data-Types
title_fullStr A Column Styled Composable Schema Matcher for Semantic Data-Types
title_full_unstemmed A Column Styled Composable Schema Matcher for Semantic Data-Types
title_short A Column Styled Composable Schema Matcher for Semantic Data-Types
title_sort column styled composable schema matcher for semantic data types
topic Schema Matching
Semantic Data-types
XML
RDF
url https://datascience.codata.org/articles/973
work_keys_str_mv AT xiaofengliao acolumnstyledcomposableschemamatcherforsemanticdatatypes
AT jordybottelier acolumnstyledcomposableschemamatcherforsemanticdatatypes
AT zhimingzhao acolumnstyledcomposableschemamatcherforsemanticdatatypes
AT xiaofengliao columnstyledcomposableschemamatcherforsemanticdatatypes
AT jordybottelier columnstyledcomposableschemamatcherforsemanticdatatypes
AT zhimingzhao columnstyledcomposableschemamatcherforsemanticdatatypes