Semantic-Similarity-Based Schema Matching for Management of Building Energy Data

The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only...

Full description

Bibliographic Details
Main Authors: Zhiyu Pan, Guanchen Pan, Antonello Monti
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Energies
Subjects:
Online Access:https://www.mdpi.com/1996-1073/15/23/8894
_version_ 1797463371771019264
author Zhiyu Pan
Guanchen Pan
Antonello Monti
author_facet Zhiyu Pan
Guanchen Pan
Antonello Monti
author_sort Zhiyu Pan
collection DOAJ
description The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.
first_indexed 2024-03-09T17:49:43Z
format Article
id doaj.art-b34a3a918721401496a81e0bd825e3d6
institution Directory Open Access Journal
issn 1996-1073
language English
last_indexed 2024-03-09T17:49:43Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Energies
spelling doaj.art-b34a3a918721401496a81e0bd825e3d62023-11-24T10:52:03ZengMDPI AGEnergies1996-10732022-11-011523889410.3390/en15238894Semantic-Similarity-Based Schema Matching for Management of Building Energy DataZhiyu Pan0Guanchen Pan1Antonello Monti2Institute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyInstitute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyInstitute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyThe increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.https://www.mdpi.com/1996-1073/15/23/8894semantic similarityschema matchingactive learning
spellingShingle Zhiyu Pan
Guanchen Pan
Antonello Monti
Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
Energies
semantic similarity
schema matching
active learning
title Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
title_full Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
title_fullStr Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
title_full_unstemmed Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
title_short Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
title_sort semantic similarity based schema matching for management of building energy data
topic semantic similarity
schema matching
active learning
url https://www.mdpi.com/1996-1073/15/23/8894
work_keys_str_mv AT zhiyupan semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata
AT guanchenpan semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata
AT antonellomonti semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata