Persian Causality Corpus (PerCause) and the Causality Detection Benchmark

Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of...

Full description

Bibliographic Details
Main Authors: Zeinab Rahimi, Mehrnoush ShamsFard
Format: Article
Language:fas
Published: Iranian Research Institute for Information and Technology 2022-12-01
Series:Iranian Journal of Information Processing & Management
Subjects:
Online Access:http://jipm.irandoc.ac.ir/article-1-4877-en.html
_version_ 1811164678996361216
author Zeinab Rahimi
Mehrnoush ShamsFard
author_facet Zeinab Rahimi
Mehrnoush ShamsFard
author_sort Zeinab Rahimi
collection DOAJ
description Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of 4446 sentences and 5128 causal relations. Three labels of Cause, Effect, and Causal mark are specified to each relation, if possible. We used this corpus to train a system for detecting causal elements’ boundaries. Also, we present a causality detection benchmark for three machine-learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through the CRF classifier, which provides an F-measure of 0.76. In addition, the best accuracy (91.4%) is obtained through the BiLSTM-CRF deep learning method
first_indexed 2024-04-10T15:25:20Z
format Article
id doaj.art-eb83fd7d311a4b4fb1dd62fe1f6e0f43
institution Directory Open Access Journal
issn 2251-8223
2251-8231
language fas
last_indexed 2024-04-10T15:25:20Z
publishDate 2022-12-01
publisher Iranian Research Institute for Information and Technology
record_format Article
series Iranian Journal of Information Processing & Management
spelling doaj.art-eb83fd7d311a4b4fb1dd62fe1f6e0f432023-02-14T09:53:00ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312022-12-01382273303Persian Causality Corpus (PerCause) and the Causality Detection BenchmarkZeinab Rahimi0Mehrnoush ShamsFard1 NLP Research Laboratory, Shahid Beheshti University, Tehran, Iran NLP Research Laboratory, Shahid Beheshti University, Tehran, Iran Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of 4446 sentences and 5128 causal relations. Three labels of Cause, Effect, and Causal mark are specified to each relation, if possible. We used this corpus to train a system for detecting causal elements’ boundaries. Also, we present a causality detection benchmark for three machine-learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through the CRF classifier, which provides an F-measure of 0.76. In addition, the best accuracy (91.4%) is obtained through the BiLSTM-CRF deep learning methodhttp://jipm.irandoc.ac.ir/article-1-4877-en.htmlpercausecausality annotated corpuscausality detectiondeep learningcrf
spellingShingle Zeinab Rahimi
Mehrnoush ShamsFard
Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
Iranian Journal of Information Processing & Management
percause
causality annotated corpus
causality detection
deep learning
crf
title Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
title_full Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
title_fullStr Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
title_full_unstemmed Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
title_short Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
title_sort persian causality corpus percause and the causality detection benchmark
topic percause
causality annotated corpus
causality detection
deep learning
crf
url http://jipm.irandoc.ac.ir/article-1-4877-en.html
work_keys_str_mv AT zeinabrahimi persiancausalitycorpuspercauseandthecausalitydetectionbenchmark
AT mehrnoushshamsfard persiancausalitycorpuspercauseandthecausalitydetectionbenchmark