Biomedical ontology alignment: an approach based on representation learning

Abstract Background While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learni...

Full description

Bibliographic Details
Main Authors: Prodromos Kolyvakis, Alexandros Kalousis, Barry Smith, Dimitris Kiritsis
Format: Article
Language:English
Published: BMC 2018-08-01
Series:Journal of Biomedical Semantics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13326-018-0187-8
_version_ 1819149194329325568
author Prodromos Kolyvakis
Alexandros Kalousis
Barry Smith
Dimitris Kiritsis
author_facet Prodromos Kolyvakis
Alexandros Kalousis
Barry Smith
Dimitris Kiritsis
author_sort Prodromos Kolyvakis
collection DOAJ
description Abstract Background While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. Results An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results. Conclusions Our proposed representation learning approach leverages terminological embeddings to capture semantic similarity. Our results provide evidence that the approach produces embeddings that are especially well tailored to the ontology matching task, demonstrating a novel pathway for the problem.
first_indexed 2024-12-22T13:57:44Z
format Article
id doaj.art-55e13df2e1244b099f554976f3e41e1e
institution Directory Open Access Journal
issn 2041-1480
language English
last_indexed 2024-12-22T13:57:44Z
publishDate 2018-08-01
publisher BMC
record_format Article
series Journal of Biomedical Semantics
spelling doaj.art-55e13df2e1244b099f554976f3e41e1e2022-12-21T18:23:31ZengBMCJournal of Biomedical Semantics2041-14802018-08-019112010.1186/s13326-018-0187-8Biomedical ontology alignment: an approach based on representation learningProdromos Kolyvakis0Alexandros Kalousis1Barry Smith2Dimitris Kiritsis3École Polytechnique Fédérale de Lausanne (EPFL)Business Informatics Department, University of Applied SciencesDepartment of Philosophy and Department of Biomedical InformaticsÉcole Polytechnique Fédérale de Lausanne (EPFL)Abstract Background While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. Results An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results. Conclusions Our proposed representation learning approach leverages terminological embeddings to capture semantic similarity. Our results provide evidence that the approach produces embeddings that are especially well tailored to the ontology matching task, demonstrating a novel pathway for the problem.http://link.springer.com/article/10.1186/s13326-018-0187-8Ontology matchingSemantic similaritySentence embeddingsWord embeddingsDenoising autoencoderOutlier detection
spellingShingle Prodromos Kolyvakis
Alexandros Kalousis
Barry Smith
Dimitris Kiritsis
Biomedical ontology alignment: an approach based on representation learning
Journal of Biomedical Semantics
Ontology matching
Semantic similarity
Sentence embeddings
Word embeddings
Denoising autoencoder
Outlier detection
title Biomedical ontology alignment: an approach based on representation learning
title_full Biomedical ontology alignment: an approach based on representation learning
title_fullStr Biomedical ontology alignment: an approach based on representation learning
title_full_unstemmed Biomedical ontology alignment: an approach based on representation learning
title_short Biomedical ontology alignment: an approach based on representation learning
title_sort biomedical ontology alignment an approach based on representation learning
topic Ontology matching
Semantic similarity
Sentence embeddings
Word embeddings
Denoising autoencoder
Outlier detection
url http://link.springer.com/article/10.1186/s13326-018-0187-8
work_keys_str_mv AT prodromoskolyvakis biomedicalontologyalignmentanapproachbasedonrepresentationlearning
AT alexandroskalousis biomedicalontologyalignmentanapproachbasedonrepresentationlearning
AT barrysmith biomedicalontologyalignmentanapproachbasedonrepresentationlearning
AT dimitriskiritsis biomedicalontologyalignmentanapproachbasedonrepresentationlearning