Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression

Instance based schema matching aims to identify correspondences between different schema attributes. Several approaches have been proposed to discover these correspondences in which instances including those with numeric values are treated as strings. This prevents discovering common patterns or per...

Full description

Bibliographic Details
Main Authors: Mehdi, Osama A., Ibrahim, Hamidah, Affendey, Lilly Suriani, Pardede, Eric, Cao, Jinli
Format: Article
Language:English
Published: ComSIS Consortium 2018
Online Access:http://psasir.upm.edu.my/id/eprint/72675/1/Exploring%20instances%20for%20matching%20heterogeneous%20database%20schemas%20utilizing%20Google%20similarity%20and%20regular%20expression.pdf
_version_ 1825950125522944000
author Mehdi, Osama A.
Ibrahim, Hamidah
Affendey, Lilly Suriani
Pardede, Eric
Cao, Jinli
author_facet Mehdi, Osama A.
Ibrahim, Hamidah
Affendey, Lilly Suriani
Pardede, Eric
Cao, Jinli
author_sort Mehdi, Osama A.
collection UPM
description Instance based schema matching aims to identify correspondences between different schema attributes. Several approaches have been proposed to discover these correspondences in which instances including those with numeric values are treated as strings. This prevents discovering common patterns or performing statistical computation between numeric instances. Consequently, this causes unidentified matches for numeric instances which further effect the results. In this paper, we propose an approach for addressing the problem of finding matches between schemas of semantically and syntactically related attributes. Since we only fully exploit the instances of the schemas, we rely on strategies that combine the strength of Google as a web semantic and regular expression as pattern recognition. To demonstrate the accuracy of our approach, we have conducted an experimental evaluation using real world datasets. The results show that our approach is able to find 1-1 matches with high accuracy in the range of 93% - 99%. Furthermore, our proposed approach outperformed the previous approaches using a sample of instances.
first_indexed 2024-03-06T10:09:50Z
format Article
id upm.eprints-72675
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T10:09:50Z
publishDate 2018
publisher ComSIS Consortium
record_format dspace
spelling upm.eprints-726752020-11-30T06:43:40Z http://psasir.upm.edu.my/id/eprint/72675/ Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression Mehdi, Osama A. Ibrahim, Hamidah Affendey, Lilly Suriani Pardede, Eric Cao, Jinli Instance based schema matching aims to identify correspondences between different schema attributes. Several approaches have been proposed to discover these correspondences in which instances including those with numeric values are treated as strings. This prevents discovering common patterns or performing statistical computation between numeric instances. Consequently, this causes unidentified matches for numeric instances which further effect the results. In this paper, we propose an approach for addressing the problem of finding matches between schemas of semantically and syntactically related attributes. Since we only fully exploit the instances of the schemas, we rely on strategies that combine the strength of Google as a web semantic and regular expression as pattern recognition. To demonstrate the accuracy of our approach, we have conducted an experimental evaluation using real world datasets. The results show that our approach is able to find 1-1 matches with high accuracy in the range of 93% - 99%. Furthermore, our proposed approach outperformed the previous approaches using a sample of instances. ComSIS Consortium 2018-06 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/72675/1/Exploring%20instances%20for%20matching%20heterogeneous%20database%20schemas%20utilizing%20Google%20similarity%20and%20regular%20expression.pdf Mehdi, Osama A. and Ibrahim, Hamidah and Affendey, Lilly Suriani and Pardede, Eric and Cao, Jinli (2018) Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression. Computer Science and Information Systems, 15 (2). 295 - 320. ISSN 1820-0214; ESSN: 2406-1018 http://www.comsis.org/archive.php?show=ppr633-1705 10.2298/CSIS170525002M
spellingShingle Mehdi, Osama A.
Ibrahim, Hamidah
Affendey, Lilly Suriani
Pardede, Eric
Cao, Jinli
Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title_full Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title_fullStr Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title_full_unstemmed Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title_short Exploring instances for matching heterogeneous database schemas utilizing Google similarity and regular expression
title_sort exploring instances for matching heterogeneous database schemas utilizing google similarity and regular expression
url http://psasir.upm.edu.my/id/eprint/72675/1/Exploring%20instances%20for%20matching%20heterogeneous%20database%20schemas%20utilizing%20Google%20similarity%20and%20regular%20expression.pdf
work_keys_str_mv AT mehdiosamaa exploringinstancesformatchingheterogeneousdatabaseschemasutilizinggooglesimilarityandregularexpression
AT ibrahimhamidah exploringinstancesformatchingheterogeneousdatabaseschemasutilizinggooglesimilarityandregularexpression
AT affendeylillysuriani exploringinstancesformatchingheterogeneousdatabaseschemasutilizinggooglesimilarityandregularexpression
AT pardedeeric exploringinstancesformatchingheterogeneousdatabaseschemasutilizinggooglesimilarityandregularexpression
AT caojinli exploringinstancesformatchingheterogeneousdatabaseschemasutilizinggooglesimilarityandregularexpression