Automatic Identification of Domain Terms: An Approach for Italian

The problem of creating a fully automated specific-domain thesaurus is very topical. The paper presents a novel method to address this problem in the Italian language. The main feature of this approach is the integration of different methods: machine learning classification methods working on the se...

Full description

Bibliographic Details
Main Authors: Maria Teresa Artese, Isabella Gagliardi
Format: Article
Language:English
Published: Bulgarian Academy of Sciences, Institute of Mathematics and Informatics 2020-09-01
Series:Digital Presentation and Preservation of Cultural and Scientific Heritage
Subjects:
Online Access:https://dipp.math.bas.bg/dipp/article/view/121
_version_ 1811257559352344576
author Maria Teresa Artese
Isabella Gagliardi
author_facet Maria Teresa Artese
Isabella Gagliardi
author_sort Maria Teresa Artese
collection DOAJ
description The problem of creating a fully automated specific-domain thesaurus is very topical. The paper presents a novel method to address this problem in the Italian language. The main feature of this approach is the integration of different methods: machine learning classification methods working on the semantic representation of candidate terms, word embeddings models, able to capture the semantics of words, and a computation of the degree of specialization of a term. The work is in progress and results obtained so far are promising.
first_indexed 2024-04-12T17:59:22Z
format Article
id doaj.art-85f9a4e49d2d456f8878e7251b469494
institution Directory Open Access Journal
issn 1314-4006
2535-0366
language English
last_indexed 2024-04-12T17:59:22Z
publishDate 2020-09-01
publisher Bulgarian Academy of Sciences, Institute of Mathematics and Informatics
record_format Article
series Digital Presentation and Preservation of Cultural and Scientific Heritage
spelling doaj.art-85f9a4e49d2d456f8878e7251b4694942022-12-22T03:22:14ZengBulgarian Academy of Sciences, Institute of Mathematics and InformaticsDigital Presentation and Preservation of Cultural and Scientific Heritage1314-40062535-03662020-09-011010.55630/dipp.2020.10.21Automatic Identification of Domain Terms: An Approach for ItalianMaria Teresa Artese0Isabella Gagliardi1IMATI – CNR, Via Bassini 15, 20133, Milan, ItalyIMATI – CNR, Via Bassini 15, 20133, Milan, ItalyThe problem of creating a fully automated specific-domain thesaurus is very topical. The paper presents a novel method to address this problem in the Italian language. The main feature of this approach is the integration of different methods: machine learning classification methods working on the semantic representation of candidate terms, word embeddings models, able to capture the semantics of words, and a computation of the degree of specialization of a term. The work is in progress and results obtained so far are promising.https://dipp.math.bas.bg/dipp/article/view/121Classification MethodsWord Embedding ModelsProbabilityFoodItalian Language
spellingShingle Maria Teresa Artese
Isabella Gagliardi
Automatic Identification of Domain Terms: An Approach for Italian
Digital Presentation and Preservation of Cultural and Scientific Heritage
Classification Methods
Word Embedding Models
Probability
Food
Italian Language
title Automatic Identification of Domain Terms: An Approach for Italian
title_full Automatic Identification of Domain Terms: An Approach for Italian
title_fullStr Automatic Identification of Domain Terms: An Approach for Italian
title_full_unstemmed Automatic Identification of Domain Terms: An Approach for Italian
title_short Automatic Identification of Domain Terms: An Approach for Italian
title_sort automatic identification of domain terms an approach for italian
topic Classification Methods
Word Embedding Models
Probability
Food
Italian Language
url https://dipp.math.bas.bg/dipp/article/view/121
work_keys_str_mv AT mariateresaartese automaticidentificationofdomaintermsanapproachforitalian
AT isabellagagliardi automaticidentificationofdomaintermsanapproachforitalian