Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Energies |
Subjects: | |
Online Access: | https://www.mdpi.com/1996-1073/15/23/8894 |
_version_ | 1797463371771019264 |
---|---|
author | Zhiyu Pan Guanchen Pan Antonello Monti |
author_facet | Zhiyu Pan Guanchen Pan Antonello Monti |
author_sort | Zhiyu Pan |
collection | DOAJ |
description | The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required. |
first_indexed | 2024-03-09T17:49:43Z |
format | Article |
id | doaj.art-b34a3a918721401496a81e0bd825e3d6 |
institution | Directory Open Access Journal |
issn | 1996-1073 |
language | English |
last_indexed | 2024-03-09T17:49:43Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Energies |
spelling | doaj.art-b34a3a918721401496a81e0bd825e3d62023-11-24T10:52:03ZengMDPI AGEnergies1996-10732022-11-011523889410.3390/en15238894Semantic-Similarity-Based Schema Matching for Management of Building Energy DataZhiyu Pan0Guanchen Pan1Antonello Monti2Institute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyInstitute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyInstitute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, GermanyThe increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.https://www.mdpi.com/1996-1073/15/23/8894semantic similarityschema matchingactive learning |
spellingShingle | Zhiyu Pan Guanchen Pan Antonello Monti Semantic-Similarity-Based Schema Matching for Management of Building Energy Data Energies semantic similarity schema matching active learning |
title | Semantic-Similarity-Based Schema Matching for Management of Building Energy Data |
title_full | Semantic-Similarity-Based Schema Matching for Management of Building Energy Data |
title_fullStr | Semantic-Similarity-Based Schema Matching for Management of Building Energy Data |
title_full_unstemmed | Semantic-Similarity-Based Schema Matching for Management of Building Energy Data |
title_short | Semantic-Similarity-Based Schema Matching for Management of Building Energy Data |
title_sort | semantic similarity based schema matching for management of building energy data |
topic | semantic similarity schema matching active learning |
url | https://www.mdpi.com/1996-1073/15/23/8894 |
work_keys_str_mv | AT zhiyupan semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata AT guanchenpan semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata AT antonellomonti semanticsimilaritybasedschemamatchingformanagementofbuildingenergydata |