Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA)...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-06-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/13/6584 |
_version_ | 1827655396905451520 |
---|---|
author | Arti Jain Anuja Arora Jorge Morato Divakar Yadav Kumar Vimal Kumar |
author_facet | Arti Jain Anuja Arora Jorge Morato Divakar Yadav Kumar Vimal Kumar |
author_sort | Arti Jain |
collection | DOAJ |
description | In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%. |
first_indexed | 2024-03-09T22:07:08Z |
format | Article |
id | doaj.art-10f12637d4e247bd8059b75a46cd8af4 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T22:07:08Z |
publishDate | 2022-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-10f12637d4e247bd8059b75a46cd8af42023-11-23T19:39:18ZengMDPI AGApplied Sciences2076-34172022-06-011213658410.3390/app12136584Automatic Text Summarization for Hindi Using Real Coded Genetic AlgorithmArti Jain0Anuja Arora1Jorge Morato2Divakar Yadav3Kumar Vimal Kumar4Department of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaDepartment of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaComputer Science, Universidad Carlos III de Madrid, 28911 Leganes, SpainDepartment of CSE, NIT Hamirpur, Hamirpur 177005, IndiaDepartment of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaIn the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.https://www.mdpi.com/2076-3417/12/13/6584automatic text summarizationextractive summaryfeature setHindi languageHindi health datanamed entity |
spellingShingle | Arti Jain Anuja Arora Jorge Morato Divakar Yadav Kumar Vimal Kumar Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm Applied Sciences automatic text summarization extractive summary feature set Hindi language Hindi health data named entity |
title | Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm |
title_full | Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm |
title_fullStr | Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm |
title_full_unstemmed | Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm |
title_short | Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm |
title_sort | automatic text summarization for hindi using real coded genetic algorithm |
topic | automatic text summarization extractive summary feature set Hindi language Hindi health data named entity |
url | https://www.mdpi.com/2076-3417/12/13/6584 |
work_keys_str_mv | AT artijain automatictextsummarizationforhindiusingrealcodedgeneticalgorithm AT anujaarora automatictextsummarizationforhindiusingrealcodedgeneticalgorithm AT jorgemorato automatictextsummarizationforhindiusingrealcodedgeneticalgorithm AT divakaryadav automatictextsummarizationforhindiusingrealcodedgeneticalgorithm AT kumarvimalkumar automatictextsummarizationforhindiusingrealcodedgeneticalgorithm |