Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Korea Genome Organization
2019-06-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf |
_version_ | 1819110584684118016 |
---|---|
author | Arnaud Ferré Mouhamadou Ba Robert Bossy |
author_facet | Arnaud Ferré Mouhamadou Ba Robert Bossy |
author_sort | Arnaud Ferré |
collection | DOAJ |
description | Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method. |
first_indexed | 2024-12-22T03:44:03Z |
format | Article |
id | doaj.art-2dcffa9df61f4c95b39887f3a76ee59d |
institution | Directory Open Access Journal |
issn | 2234-0742 |
language | English |
last_indexed | 2024-12-22T03:44:03Z |
publishDate | 2019-06-01 |
publisher | Korea Genome Organization |
record_format | Article |
series | Genomics & Informatics |
spelling | doaj.art-2dcffa9df61f4c95b39887f3a76ee59d2022-12-21T18:40:11ZengKorea Genome OrganizationGenomics & Informatics2234-07422019-06-0117210.5808/GI.2019.17.2.e20562Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training dataArnaud Ferré0Mouhamadou Ba1Robert Bossy2 MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, FranceEntity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdfbiomedical text miningentity normalizationontologyword embedding |
spellingShingle | Arnaud Ferré Mouhamadou Ba Robert Bossy Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data Genomics & Informatics biomedical text mining entity normalization ontology word embedding |
title | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_full | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_fullStr | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_full_unstemmed | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_short | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_sort | improving the contes method for normalizing biomedical text entities with concepts from an ontology with almost no training data |
topic | biomedical text mining entity normalization ontology word embedding |
url | http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf |
work_keys_str_mv | AT arnaudferre improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT mouhamadouba improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT robertbossy improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata |