<span style="font-variant: small-caps">esCorpius-m</span>: A Massive Multilingual Crawling Corpus with a Focus on Spanish

In recent years, transformer-based models have played a significant role in advancing language modeling for natural language processing. However, they require substantial amounts of data and there is a shortage of high-quality non-English corpora. Some recent initiatives have introduced multilingual...

Full description

Bibliographic Details
Main Authors: Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol, Ksenia Kharitonova, Zoraida Callejas
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/22/12155