Learning explicit and implicit Arabic discourse relations

We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the S...

Full description

Bibliographic Details
Main Authors: Iskandar Keskes, Farah Benamara Zitoune, Lamia Hadrich Belguith
Format: Article
Language:English
Published: Elsevier 2014-12-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157814000251
_version_ 1818566830427471872
author Iskandar Keskes
Farah Benamara Zitoune
Lamia Hadrich Belguith
author_facet Iskandar Keskes
Farah Benamara Zitoune
Lamia Hadrich Belguith
author_sort Iskandar Keskes
collection DOAJ
description We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%.
first_indexed 2024-12-14T01:58:38Z
format Article
id doaj.art-9025bcbbc5024d44a8088814b57a462c
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-12-14T01:58:38Z
publishDate 2014-12-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-9025bcbbc5024d44a8088814b57a462c2022-12-21T23:21:06ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782014-12-0126439841610.1016/j.jksuci.2014.06.001Learning explicit and implicit Arabic discourse relationsIskandar Keskes0Farah Benamara Zitoune1Lamia Hadrich Belguith2ANLP Research Group, MIRACL Lab-Sfax University, Tunisia & IRIT-Toulouse University, FranceIRIT-Toulouse University, FranceANLP Research Group, MIRACL Lab-Sfax University, TunisiaWe propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%.http://www.sciencedirect.com/science/article/pii/S1319157814000251Discourse relationsSegmented Discourse Representation TheoryArabic language
spellingShingle Iskandar Keskes
Farah Benamara Zitoune
Lamia Hadrich Belguith
Learning explicit and implicit Arabic discourse relations
Journal of King Saud University: Computer and Information Sciences
Discourse relations
Segmented Discourse Representation Theory
Arabic language
title Learning explicit and implicit Arabic discourse relations
title_full Learning explicit and implicit Arabic discourse relations
title_fullStr Learning explicit and implicit Arabic discourse relations
title_full_unstemmed Learning explicit and implicit Arabic discourse relations
title_short Learning explicit and implicit Arabic discourse relations
title_sort learning explicit and implicit arabic discourse relations
topic Discourse relations
Segmented Discourse Representation Theory
Arabic language
url http://www.sciencedirect.com/science/article/pii/S1319157814000251
work_keys_str_mv AT iskandarkeskes learningexplicitandimplicitarabicdiscourserelations
AT farahbenamarazitoune learningexplicitandimplicitarabicdiscourserelations
AT lamiahadrichbelguith learningexplicitandimplicitarabicdiscourserelations