Learning explicit and implicit Arabic discourse relations
We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the S...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2014-12-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157814000251 |
_version_ | 1818566830427471872 |
---|---|
author | Iskandar Keskes Farah Benamara Zitoune Lamia Hadrich Belguith |
author_facet | Iskandar Keskes Farah Benamara Zitoune Lamia Hadrich Belguith |
author_sort | Iskandar Keskes |
collection | DOAJ |
description | We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%. |
first_indexed | 2024-12-14T01:58:38Z |
format | Article |
id | doaj.art-9025bcbbc5024d44a8088814b57a462c |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-12-14T01:58:38Z |
publishDate | 2014-12-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-9025bcbbc5024d44a8088814b57a462c2022-12-21T23:21:06ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782014-12-0126439841610.1016/j.jksuci.2014.06.001Learning explicit and implicit Arabic discourse relationsIskandar Keskes0Farah Benamara Zitoune1Lamia Hadrich Belguith2ANLP Research Group, MIRACL Lab-Sfax University, Tunisia & IRIT-Toulouse University, FranceIRIT-Toulouse University, FranceANLP Research Group, MIRACL Lab-Sfax University, TunisiaWe propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%.http://www.sciencedirect.com/science/article/pii/S1319157814000251Discourse relationsSegmented Discourse Representation TheoryArabic language |
spellingShingle | Iskandar Keskes Farah Benamara Zitoune Lamia Hadrich Belguith Learning explicit and implicit Arabic discourse relations Journal of King Saud University: Computer and Information Sciences Discourse relations Segmented Discourse Representation Theory Arabic language |
title | Learning explicit and implicit Arabic discourse relations |
title_full | Learning explicit and implicit Arabic discourse relations |
title_fullStr | Learning explicit and implicit Arabic discourse relations |
title_full_unstemmed | Learning explicit and implicit Arabic discourse relations |
title_short | Learning explicit and implicit Arabic discourse relations |
title_sort | learning explicit and implicit arabic discourse relations |
topic | Discourse relations Segmented Discourse Representation Theory Arabic language |
url | http://www.sciencedirect.com/science/article/pii/S1319157814000251 |
work_keys_str_mv | AT iskandarkeskes learningexplicitandimplicitarabicdiscourserelations AT farahbenamarazitoune learningexplicitandimplicitarabicdiscourserelations AT lamiahadrichbelguith learningexplicitandimplicitarabicdiscourserelations |