Hybrid differential evolution based automatic single document text summarization

Automatic single document text summarization is a process of condensing an input text document. In this process, a summary extraction approach summarizes a document by extracting the most informative sentences in a document. To select such sentences, a sentence scoring approach is used to assign a s...

Full description

Bibliographic Details
Main Author: Mohammed Ali Abuobieda, Albaraa Abuobieda
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/38967/5/AlbaraaAbuobiedaPFSKSM2013.pdf
Description
Summary:Automatic single document text summarization is a process of condensing an input text document. In this process, a summary extraction approach summarizes a document by extracting the most informative sentences in a document. To select such sentences, a sentence scoring approach is used to assign a score for each input sentence before ranking them accordingly. Based on user defined summary ratio, only top ranked sentences are selected to be part of the summary and selecting the most informative sentences is a challenge for extractive based automatic text summarization researchers. Thus, this research proposed extraction based automatic single document text summarization methods by investigating a single meta-heuristic evolutionary algorithm called Differential Evolution (DE) to generate high quality summaries. The DE algorithm is used (i) to find out the best feature weight score to discriminate between important and non-important features, (ii) to perform as a cluster machine learning method using Normalized Google Distance and Jaccard similarity measures to generate a highly diversed summary, (iii) to employ opposition-based learning (OBL) approach to improve the performance of the DE algorithm and (iv) to develop a hybrid model used to investigate the adavantages of the combination of feature weighting, diversity and OBL approaches. To evaluate the proposed methods, the standard dataset from Document Understanding Conference (DUC) 2002 and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as the standard evaluation measurement toolkit were used. Experimental results showed that the hybrid models as well as all the proposed individual methods performed well for text summarization as compared to four benchmark methods: Microsoft Word, Copernic, the best DUC 2002, the worst DUC 2002 summarizers and a human against another human summarizer. In addition, the proposed methods in the DE algorithm outperformed Genetic Algorithm and fuzzy swarm diversity based methods evolutionary based algorithms. The results of the experiments have proven that the proposed hybrid models generate better quality text-summaries.