Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm

In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA)...

Full description

Bibliographic Details
Main Authors: Arti Jain, Anuja Arora, Jorge Morato, Divakar Yadav, Kumar Vimal Kumar
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/13/6584
_version_ 1827655396905451520
author Arti Jain
Anuja Arora
Jorge Morato
Divakar Yadav
Kumar Vimal Kumar
author_facet Arti Jain
Anuja Arora
Jorge Morato
Divakar Yadav
Kumar Vimal Kumar
author_sort Arti Jain
collection DOAJ
description In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.
first_indexed 2024-03-09T22:07:08Z
format Article
id doaj.art-10f12637d4e247bd8059b75a46cd8af4
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T22:07:08Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-10f12637d4e247bd8059b75a46cd8af42023-11-23T19:39:18ZengMDPI AGApplied Sciences2076-34172022-06-011213658410.3390/app12136584Automatic Text Summarization for Hindi Using Real Coded Genetic AlgorithmArti Jain0Anuja Arora1Jorge Morato2Divakar Yadav3Kumar Vimal Kumar4Department of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaDepartment of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaComputer Science, Universidad Carlos III de Madrid, 28911 Leganes, SpainDepartment of CSE, NIT Hamirpur, Hamirpur 177005, IndiaDepartment of CSE, Jaypee Institute of Information Technology, Noida 201309, IndiaIn the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.https://www.mdpi.com/2076-3417/12/13/6584automatic text summarizationextractive summaryfeature setHindi languageHindi health datanamed entity
spellingShingle Arti Jain
Anuja Arora
Jorge Morato
Divakar Yadav
Kumar Vimal Kumar
Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
Applied Sciences
automatic text summarization
extractive summary
feature set
Hindi language
Hindi health data
named entity
title Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
title_full Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
title_fullStr Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
title_full_unstemmed Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
title_short Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
title_sort automatic text summarization for hindi using real coded genetic algorithm
topic automatic text summarization
extractive summary
feature set
Hindi language
Hindi health data
named entity
url https://www.mdpi.com/2076-3417/12/13/6584
work_keys_str_mv AT artijain automatictextsummarizationforhindiusingrealcodedgeneticalgorithm
AT anujaarora automatictextsummarizationforhindiusingrealcodedgeneticalgorithm
AT jorgemorato automatictextsummarizationforhindiusingrealcodedgeneticalgorithm
AT divakaryadav automatictextsummarizationforhindiusingrealcodedgeneticalgorithm
AT kumarvimalkumar automatictextsummarizationforhindiusingrealcodedgeneticalgorithm