arTenTen: Arabic Corpus and Word Sketches
We present arTenTen, a web-crawled corpus of Arabic, gathered in 2012. arTenTen consists of 5.8-billion words. A chunk of it has been lemmatized and part-of-speech (POS) tagged with the MADA tool and subsequently loaded into Sketch Engine, a leading corpus query tool, where it is open for all to use...
Main Authors: | Tressy Arts, Yonatan Belinkov, Nizar Habash, Adam Kilgarriff, Vit Suchomel |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2014-12-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157814000330 |
Similar Items
-
A Critical Evaluation of Three Sesotho Dictionaries
by: Mmasibidi Setaka, et al.
Published: (2020-09-01) -
Electronic Corpora for Legal English Translator/ Interpreter Training - A Case Study
by: Bulatović Vesna
Published: (2018-12-01) -
The Pro.Bio.Dic. (Prototype of a Bioethics Dictionary) project: Building a corpus of popular and specialized bioethics texts
by: Alessandra Vicentini, et al.
Published: (2013-05-01) -
The Saudi Novel Corpus: Design and Compilation
by: Tareq Alfraidi, et al.
Published: (2022-06-01) -
Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus
by: D.J. Prinsloo
Published: (2015-11-01)