arTenTen: Arabic Corpus and Word Sketches

arTenTen: Arabic Corpus and Word Sketches

We present arTenTen, a web-crawled corpus of Arabic, gathered in 2012. arTenTen consists of 5.8-billion words. A chunk of it has been lemmatized and part-of-speech (POS) tagged with the MADA tool and subsequently loaded into Sketch Engine, a leading corpus query tool, where it is open for all to use...

Full description

Bibliographic Details
Main Authors:	Tressy Arts, Yonatan Belinkov, Nizar Habash, Adam Kilgarriff, Vit Suchomel
Format:	Article
Language:	English
Published:	Elsevier 2014-12-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Corpora Lexicography Morphology Concordance Arabic
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157814000330

Similar Items

A Critical Evaluation of Three Sesotho Dictionaries
by: Mmasibidi Setaka, et al.
Published: (2020-09-01)

Electronic Corpora for Legal English Translator/ Interpreter Training - A Case Study
by: Bulatović Vesna
Published: (2018-12-01)

The Pro.Bio.Dic. (Prototype of a Bioethics Dictionary) project: Building a corpus of popular and specialized bioethics texts
by: Alessandra Vicentini, et al.
Published: (2013-05-01)

The Saudi Novel Corpus: Design and Compilation
by: Tareq Alfraidi, et al.
Published: (2022-06-01)

Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus
by: D.J. Prinsloo
Published: (2015-11-01)

Modeling Frequency Data: Methodological Considerations on the Relationship between Dictionaries and Corpora
by: Karlheinz Mörth, et al.
Published: (2015-12-01)

L’apport du concordancier à l’analyse et à la remédiation des erreurs des apprenants dans les forums de discussion en ligne
by: Joseph Rézeau

Strategi padanan leksikal haiwan dan implikasinya dalam Kamus Besar Arab Melayu Dewan
by: Aziz @ Saari, Mohd Bakri
Published: (2019)

Eesti keele kui teise keele õpikute lausete analüüs ja selle rakendamine eri keeleoskustasemete sõnastike näitelausete automaatsel valikul
by: Kristina Koppel
Published: (2019-05-01)

From Corpus to Dictionary: A Hybrid Prescriptive, Descriptive and Proscriptive Undertaking
by: Minah Nabirye, et al.
Published: (2012-01-01)

Arabic as a Language of Politics: A Case Study in Corpus-based Teaching
by: Marco Aurelio Golfetto
Published: (2022-02-01)

Evaluating Fidelity of Persian-English Sentence-Aligned Parallel Corpus
by: Masoomeh Mashayekhi, et al.
Published: (2013-09-01)

A Corpus-based Survey of Four Electronic Swahili–English Bilingual Dictionaries
by: Guy De Pauw, et al.
Published: (2011-10-01)

Computational methods for corpus annotation and analysis /
by: Lu, Xiaofei, author
Published: (2014)

Aspects théoriques et méthodologiques de la représentativité des corpus
by: Najib Arbach, et al.

The web as a corpus: a resource for translation
by: Helia Vaezian
Published: (2018-12-01)

La place de l’adjectif : des théories aux corpus
by: Jan Goes
Published: (2017-12-01)

On the Benefits of Foreign Language Learning Based on Parallel Language Corpus
by: Joanna Satoła-Staśkowiak
Published: (2015-12-01)

Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements
by: Danuta Roszko, et al.
Published: (2015-06-01)

Utiliser des corpus numériques avec un public Lansad
by: Eva Schaeffer-Lacroix

Word Error Analysis in Aphasia: Introducing the Greek Aphasia Error Corpus (GRAEC)
by: Dimitrios Kasselimis, et al.
Published: (2020-08-01)

Como encontrar as palavras-chave mais importantes de um corpus com WordSmith tools How to find the most important keywords in a corpus with WordSmith tools
by: Tony Berber-Sardinha
Published: (2005-12-01)

Contemporary corpus linguistics /
by: Baker, Paul, 1972-
Published: (2009)

The Story of the Learner Corpus LINDSEI_CZ
by: Tomáš Gráf
Published: (2017-12-01)

Magazen: History of a Word Told Through a Project of Digital Lexicography
by: Tomasin, Lorenzo
Published: (2021-12-01)

Utilizar corpus informáticos con un público Lansad (lenguas para especialistas de otras disciplinas)
by: Eva Schaeffer-Lacroix

Corpus Linguistics, Context and Culture /
by: Wiegand, Viola editor, et al.
Published: (2019)

The Routledge handbook of corpus linguistics /
by: O'Keeffe, Anne., et al.
Published: (2010)

The structure of a dictionary to an ancient corpus (Rigveda): morphological, syntactic and semantic information
by: Krisch, T, et al.
Published: (2010)

Exploring a Multi-Layered Cross-Genre Corpus of Document-Level Semantic Relations
by: Gregor Williamson, et al.
Published: (2023-08-01)

NUWT: JAWI-SPECIFIC BUCKWALTER CORPUS FOR MALAY WORD TOKENIZATION
by: Juhaida Abu Bakar, et al.
Published: (2016-05-01)

Problems of Usage Labelling in English Lexicography Probleme van gebruiksetikettering in die Engelse leksikografie.
by: Lydia Namatende Sakwa
Published: (2012-01-01)

A Corpus-Based Analysis of Deontic Modality of Obligation and Prohibition in Arabic/English Constitutions
by: Hanem El-Farahaty, et al.
Published: (2020-12-01)

Corpus linguistics : critical concepts in linguistics /
by: Teubert, Wolfgang, et al.
Published: (2008)

Multilingual Corpus on Migration and Asylum (COMMIRE)
by: Anna Beatriz Dimas Furtado, et al.
Published: (2022-05-01)

The place of substrate words in the 'Etymologisches Wörterbuch des Althochdeutschen'
by: Schuhmann, R
Published: (2010)

Conversation in context : a corpus-driven approach /
by: 187707 Ruhlemann, Christoph
Published: (2007)

Corpus-based approaches to English language teaching /
by: Campoy Cubillo, Ma Carmen (Mari Carmen), et al.
Published: (c201)

Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)
by: Ludmila Dimitrova, et al.
Published: (2015-11-01)

Corpus linguistics based error analysis of first year Universiti Teknologi Malaysia students' writing /
by: 365983 Nurul Ros Adira Mahady, et al.
Published: (2009)