Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool
The Aranea Project offers a set of comparable corpora for two dozens of (mostly European) languages providing a convenient dataset for nLP applications that require training on large amounts of data. The article presents word embedding models trained on the Aranea corpora and an online interface to...
Main Author: | |
---|---|
Format: | Article |
Language: | Croatian |
Published: |
Institut za hrvatski jezik i jezikoslovlje
2020-01-01
|
Series: | Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje |
Subjects: | |
Online Access: | https://hrcak.srce.hr/file/356572 |