Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of...
Main Authors: | , |
---|---|
Format: | Article |
Language: | fas |
Published: |
Iranian Research Institute for Information and Technology
2022-12-01
|
Series: | Iranian Journal of Information Processing & Management |
Subjects: | |
Online Access: | http://jipm.irandoc.ac.ir/article-1-4877-en.html |
_version_ | 1811164678996361216 |
---|---|
author | Zeinab Rahimi Mehrnoush ShamsFard |
author_facet | Zeinab Rahimi Mehrnoush ShamsFard |
author_sort | Zeinab Rahimi |
collection | DOAJ |
description | Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of 4446 sentences and 5128 causal relations. Three labels of Cause, Effect, and Causal mark are specified to each relation, if possible. We used this corpus to train a system for detecting causal elements’ boundaries.
Also, we present a causality detection benchmark for three machine-learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through the CRF classifier, which provides an F-measure of 0.76. In addition, the best accuracy (91.4%) is obtained through the BiLSTM-CRF deep learning method |
first_indexed | 2024-04-10T15:25:20Z |
format | Article |
id | doaj.art-eb83fd7d311a4b4fb1dd62fe1f6e0f43 |
institution | Directory Open Access Journal |
issn | 2251-8223 2251-8231 |
language | fas |
last_indexed | 2024-04-10T15:25:20Z |
publishDate | 2022-12-01 |
publisher | Iranian Research Institute for Information and Technology |
record_format | Article |
series | Iranian Journal of Information Processing & Management |
spelling | doaj.art-eb83fd7d311a4b4fb1dd62fe1f6e0f432023-02-14T09:53:00ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312022-12-01382273303Persian Causality Corpus (PerCause) and the Causality Detection BenchmarkZeinab Rahimi0Mehrnoush ShamsFard1 NLP Research Laboratory, Shahid Beheshti University, Tehran, Iran NLP Research Laboratory, Shahid Beheshti University, Tehran, Iran Recognizing causal elements and causal relations in the text is among the challenging issues in natural language processing (NLP), specifically in low-resource languages such as Persian. In this research, we prepare a causality human-annotated corpus for the Persian language. This corpus consists of 4446 sentences and 5128 causal relations. Three labels of Cause, Effect, and Causal mark are specified to each relation, if possible. We used this corpus to train a system for detecting causal elements’ boundaries. Also, we present a causality detection benchmark for three machine-learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through the CRF classifier, which provides an F-measure of 0.76. In addition, the best accuracy (91.4%) is obtained through the BiLSTM-CRF deep learning methodhttp://jipm.irandoc.ac.ir/article-1-4877-en.htmlpercausecausality annotated corpuscausality detectiondeep learningcrf |
spellingShingle | Zeinab Rahimi Mehrnoush ShamsFard Persian Causality Corpus (PerCause) and the Causality Detection Benchmark Iranian Journal of Information Processing & Management percause causality annotated corpus causality detection deep learning crf |
title | Persian Causality Corpus (PerCause) and the Causality Detection Benchmark |
title_full | Persian Causality Corpus (PerCause) and the Causality Detection Benchmark |
title_fullStr | Persian Causality Corpus (PerCause) and the Causality Detection Benchmark |
title_full_unstemmed | Persian Causality Corpus (PerCause) and the Causality Detection Benchmark |
title_short | Persian Causality Corpus (PerCause) and the Causality Detection Benchmark |
title_sort | persian causality corpus percause and the causality detection benchmark |
topic | percause causality annotated corpus causality detection deep learning crf |
url | http://jipm.irandoc.ac.ir/article-1-4877-en.html |
work_keys_str_mv | AT zeinabrahimi persiancausalitycorpuspercauseandthecausalitydetectionbenchmark AT mehrnoushshamsfard persiancausalitycorpuspercauseandthecausalitydetectionbenchmark |