An Arabic CCG approach for determining constituent types from Arabic Treebank

Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category co...

Full description

Bibliographic Details
Main Authors: Ahmed I. El-taher, Hitahm M. Abo Bakr, Ibrahim Zidan, Khaled Shaalan
Format: Article
Language:English
Published: Elsevier 2014-12-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157814000299
_version_ 1811215634020696064
author Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
author_facet Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
author_sort Ahmed I. El-taher
collection DOAJ
description Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data.
first_indexed 2024-04-12T06:25:53Z
format Article
id doaj.art-5bcff5998ade41f7bf6bd4180f7523b7
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-04-12T06:25:53Z
publishDate 2014-12-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-5bcff5998ade41f7bf6bd4180f7523b72022-12-22T03:44:08ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782014-12-0126444144910.1016/j.jksuci.2014.06.005An Arabic CCG approach for determining constituent types from Arabic TreebankAhmed I. El-taher0Hitahm M. Abo Bakr1Ibrahim Zidan2Khaled Shaalan3Derpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptDerpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, EgyptThe British University, Dubai, United Arab EmiratesConverting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data.http://www.sciencedirect.com/science/article/pii/S1319157814000299ArabicCCGbankTreebank
spellingShingle Ahmed I. El-taher
Hitahm M. Abo Bakr
Ibrahim Zidan
Khaled Shaalan
An Arabic CCG approach for determining constituent types from Arabic Treebank
Journal of King Saud University: Computer and Information Sciences
Arabic
CCGbank
Treebank
title An Arabic CCG approach for determining constituent types from Arabic Treebank
title_full An Arabic CCG approach for determining constituent types from Arabic Treebank
title_fullStr An Arabic CCG approach for determining constituent types from Arabic Treebank
title_full_unstemmed An Arabic CCG approach for determining constituent types from Arabic Treebank
title_short An Arabic CCG approach for determining constituent types from Arabic Treebank
title_sort arabic ccg approach for determining constituent types from arabic treebank
topic Arabic
CCGbank
Treebank
url http://www.sciencedirect.com/science/article/pii/S1319157814000299
work_keys_str_mv AT ahmedieltaher anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT hitahmmabobakr anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ibrahimzidan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT khaledshaalan anarabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ahmedieltaher arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT hitahmmabobakr arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT ibrahimzidan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank
AT khaledshaalan arabicccgapproachfordeterminingconstituenttypesfromarabictreebank