Use of Ontologies for Data Integration and Curation

Data curation includes the goal of facilitating the re-use and combination of datasets, which is often impeded by incompatible data schema. Can we use ontologies to help with data integration? We suggest a semi-automatic process that involves the use of automatic text searching to help identify over...

Full description

Bibliographic Details
Main Authors: Judith Gelernter, Michael Lesk
Format: Article
Language:English
Published: University of Edinburgh 2011-03-01
Series:International Journal of Digital Curation
Online Access:https://ijdc.net/index.php/ijdc/article/view/173
_version_ 1797323863714955264
author Judith Gelernter
Michael Lesk
author_facet Judith Gelernter
Michael Lesk
author_sort Judith Gelernter
collection DOAJ
description Data curation includes the goal of facilitating the re-use and combination of datasets, which is often impeded by incompatible data schema. Can we use ontologies to help with data integration? We suggest a semi-automatic process that involves the use of automatic text searching to help identify overlaps in metadata that accompany data schemas, plus human validation of suggested data matches. Problems include different text used to describe the same concept, different forms of data recording and different organizations of data. Ontologies can help by focussing attention on important words, providing synonyms to assist matching, and indicating in what context words are used. Beyond ontologies, data on the statistical behavior of data can be used to decide which data elements appear to be compatible with which other data elements. When curating data which may have hundreds or even thousands of data labels, semi-automatic assistance with data fusion should be of great help.
first_indexed 2024-03-08T05:35:15Z
format Article
id doaj.art-b92e815679104ebf924ea9ec796fa776
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-08T05:35:15Z
publishDate 2011-03-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-b92e815679104ebf924ea9ec796fa7762024-02-06T00:07:22ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562011-03-0161Use of Ontologies for Data Integration and CurationJudith GelernterMichael LeskData curation includes the goal of facilitating the re-use and combination of datasets, which is often impeded by incompatible data schema. Can we use ontologies to help with data integration? We suggest a semi-automatic process that involves the use of automatic text searching to help identify overlaps in metadata that accompany data schemas, plus human validation of suggested data matches. Problems include different text used to describe the same concept, different forms of data recording and different organizations of data. Ontologies can help by focussing attention on important words, providing synonyms to assist matching, and indicating in what context words are used. Beyond ontologies, data on the statistical behavior of data can be used to decide which data elements appear to be compatible with which other data elements. When curating data which may have hundreds or even thousands of data labels, semi-automatic assistance with data fusion should be of great help. https://ijdc.net/index.php/ijdc/article/view/173
spellingShingle Judith Gelernter
Michael Lesk
Use of Ontologies for Data Integration and Curation
International Journal of Digital Curation
title Use of Ontologies for Data Integration and Curation
title_full Use of Ontologies for Data Integration and Curation
title_fullStr Use of Ontologies for Data Integration and Curation
title_full_unstemmed Use of Ontologies for Data Integration and Curation
title_short Use of Ontologies for Data Integration and Curation
title_sort use of ontologies for data integration and curation
url https://ijdc.net/index.php/ijdc/article/view/173
work_keys_str_mv AT judithgelernter useofontologiesfordataintegrationandcuration
AT michaellesk useofontologiesfordataintegrationandcuration