PRo-Pat: Probabilistic Root–Pattern Bi-gram data language model for Arabic based morphological analysis and distribution

Based on 29,192,662 html files obtained from the ClueWeb a bi-gram data language model for Arabic is constructed. The created dataset is considering standard types of bi-gram analysis, however with focus on the root11 An Arabic root depict the basic morpheme of an Arabic word at a higher level of ab...

Full description

Bibliographic Details
Main Authors: Bassam Haddad, Ahmad Awwad, Mamoun Hattab, Ammar Hattab
Format: Article
Language:English
Published: Elsevier 2023-02-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340922010782