An Automatic Persian Text Summarization System Based on Linguistic Features and Regression
Considering the vast amount of existing written information and the shortage of time, optimal summarization of books, articles, news reports, etc. on the Web is a major concern of researchers. In this paper, we propose a new approach for Persian single-document summarization based on several linguis...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | fas |
Published: |
Iranian Research Institute for Information and Technology
2018-09-01
|
Series: | Iranian Journal of Information Processing & Management |
Subjects: | |
Online Access: | http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-3807-1&slc_lang=en&sid=1 |
_version_ | 1819154104360894464 |
---|---|
author | Mahmood Soltani Jalal Nasiri Ehsan Asgarian |
author_facet | Mahmood Soltani Jalal Nasiri Ehsan Asgarian |
author_sort | Mahmood Soltani |
collection | DOAJ |
description | Considering the vast amount of existing written information and the shortage of time, optimal summarization of books, articles, news reports, etc. on the Web is a major concern of researchers. In this paper, we propose a new approach for Persian single-document summarization based on several linguistic features of text. In our approach after extracting the linguistic features for each sentence, the weight of features is learned by a linear regression method. We select one sentence with maximum score at each step of algorithm. The score of each sentence is calculated based on two factors: first, sum of the weighted features and second, the amount of its similarity to the sentences that are selected for final summary previously. We use an automatic evaluation tool to compare our approach with other existing approaches. The result indicates that our method improves the performance of summarization. |
first_indexed | 2024-12-22T15:15:46Z |
format | Article |
id | doaj.art-4f4e6ec7aaf6465eba9947d9c6370f9f |
institution | Directory Open Access Journal |
issn | 2251-8223 2251-8231 |
language | fas |
last_indexed | 2024-12-22T15:15:46Z |
publishDate | 2018-09-01 |
publisher | Iranian Research Institute for Information and Technology |
record_format | Article |
series | Iranian Journal of Information Processing & Management |
spelling | doaj.art-4f4e6ec7aaf6465eba9947d9c6370f9f2022-12-21T18:21:45ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312018-09-0133418091828An Automatic Persian Text Summarization System Based on Linguistic Features and RegressionMahmood Soltani0Jalal Nasiri1Ehsan Asgarian2 Department of Computer Engineering; Quchan University of Technology ranian Research Institute for Information Science and Technology (IranDoc) Engineering Department of Ferdowsi University of Mashhad Considering the vast amount of existing written information and the shortage of time, optimal summarization of books, articles, news reports, etc. on the Web is a major concern of researchers. In this paper, we propose a new approach for Persian single-document summarization based on several linguistic features of text. In our approach after extracting the linguistic features for each sentence, the weight of features is learned by a linear regression method. We select one sentence with maximum score at each step of algorithm. The score of each sentence is calculated based on two factors: first, sum of the weighted features and second, the amount of its similarity to the sentences that are selected for final summary previously. We use an automatic evaluation tool to compare our approach with other existing approaches. The result indicates that our method improves the performance of summarization.http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-3807-1&slc_lang=en&sid=1Single-Document SummarizationLinguistic FeatureLinear Regression |
spellingShingle | Mahmood Soltani Jalal Nasiri Ehsan Asgarian An Automatic Persian Text Summarization System Based on Linguistic Features and Regression Iranian Journal of Information Processing & Management Single-Document Summarization Linguistic Feature Linear Regression |
title | An Automatic Persian Text Summarization System Based on Linguistic Features and Regression |
title_full | An Automatic Persian Text Summarization System Based on Linguistic Features and Regression |
title_fullStr | An Automatic Persian Text Summarization System Based on Linguistic Features and Regression |
title_full_unstemmed | An Automatic Persian Text Summarization System Based on Linguistic Features and Regression |
title_short | An Automatic Persian Text Summarization System Based on Linguistic Features and Regression |
title_sort | automatic persian text summarization system based on linguistic features and regression |
topic | Single-Document Summarization Linguistic Feature Linear Regression |
url | http://jipm.irandoc.ac.ir/browse.php?a_code=A-10-3807-1&slc_lang=en&sid=1 |
work_keys_str_mv | AT mahmoodsoltani anautomaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression AT jalalnasiri anautomaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression AT ehsanasgarian anautomaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression AT mahmoodsoltani automaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression AT jalalnasiri automaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression AT ehsanasgarian automaticpersiantextsummarizationsystembasedonlinguisticfeaturesandregression |