A cross-lingual similarity measure for detecting biomedical term translations.

Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually tr...

Full description

Bibliographic Details
Main Authors: Danushka Bollegala, Georgios Kontonatsios, Sophia Ananiadou
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4452086?pdf=render
_version_ 1818338640722395136
author Danushka Bollegala
Georgios Kontonatsios
Sophia Ananiadou
author_facet Danushka Bollegala
Georgios Kontonatsios
Sophia Ananiadou
author_sort Danushka Bollegala
collection DOAJ
description Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English-French, English-Spanish, English-Greek, and English-Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.
first_indexed 2024-12-13T15:14:20Z
format Article
id doaj.art-179ba0d17ae24274ac9dbcfb1f9e0617
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-13T15:14:20Z
publishDate 2015-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-179ba0d17ae24274ac9dbcfb1f9e06172022-12-21T23:40:46ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01106e012619610.1371/journal.pone.0126196A cross-lingual similarity measure for detecting biomedical term translations.Danushka BollegalaGeorgios KontonatsiosSophia AnaniadouBilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English-French, English-Spanish, English-Greek, and English-Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.http://europepmc.org/articles/PMC4452086?pdf=render
spellingShingle Danushka Bollegala
Georgios Kontonatsios
Sophia Ananiadou
A cross-lingual similarity measure for detecting biomedical term translations.
PLoS ONE
title A cross-lingual similarity measure for detecting biomedical term translations.
title_full A cross-lingual similarity measure for detecting biomedical term translations.
title_fullStr A cross-lingual similarity measure for detecting biomedical term translations.
title_full_unstemmed A cross-lingual similarity measure for detecting biomedical term translations.
title_short A cross-lingual similarity measure for detecting biomedical term translations.
title_sort cross lingual similarity measure for detecting biomedical term translations
url http://europepmc.org/articles/PMC4452086?pdf=render
work_keys_str_mv AT danushkabollegala acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT georgioskontonatsios acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT sophiaananiadou acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT danushkabollegala crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT georgioskontonatsios crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT sophiaananiadou crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations