Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool

The Aranea Project offers a set of comparable corpora for two dozens of (mostly European) languages providing a convenient dataset for nLP applications that require training on large amounts of data. The article presents word embedding models trained on the Aranea corpora and an online interface to...

Full description

Bibliographic Details
Main Author: Radovan Garabík
Format: Article
Language:Croatian
Published: Institut za hrvatski jezik i jezikoslovlje 2020-01-01
Series:Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje
Subjects:
Online Access:https://hrcak.srce.hr/file/356572