An evolutionary based features construction methods for data summarization approach

Coral reefs are on course to become the first ecosystem that human activity will eliminate entirely from the Earth, a leading United Nations scientist claims. It is predicted that this event will occur before the end of the present century, which means that there are children already born who will l...

Full description

Bibliographic Details
Main Authors: Rayner Alfred, Suraya Alias, Chin, Kim On
Format: Research Report
Language:English
Published: Universiti Malaysia Sabah 2015
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/24680/1/An%20evolutionary%20based%20features%20construction%20methods%20for%20data%20%20summarization%20approach.pdf
_version_ 1796910291439583232
author Rayner Alfred
Suraya Alias
Chin, Kim On
author_facet Rayner Alfred
Suraya Alias
Chin, Kim On
author_sort Rayner Alfred
collection UMS
description Coral reefs are on course to become the first ecosystem that human activity will eliminate entirely from the Earth, a leading United Nations scientist claims. It is predicted that this event will occur before the end of the present century, which means that there are children already born who will live to see a world without coral. Coral reefs are important for the immense biodiversity of their ecosystems. They contain a quarter of all marine species. This research addresses the question whether a data summarization approach can be utilized to predict the survival of Coral Reefs in Malaysia by identifying the survival factors for these Coral Reefs. A data summarization approach is proposed due to its capability to learn data stored in multiple tables. In other words, this research will discuss the application of genetic algorithm to optimize the feature construction process from the Coral Reefs data to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). The DARA algorithm will be applied to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non­target tables correspond to a single record i,tored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm.This research proposes novel feature construction methods, called Variable Length Feature Construction without Substitution (VLFCWOS) and Variable Length Feature Construction with Substitution(VLFCWS), in order to construct a set of relevant features in learning relational data. These methods are proposed to improve the descriptive accuracy of the summarized data. In the process of summarizing relational data, a genetic algorithm is also applied and several feature scoring measures are evaluated in order to find the best set of relevant constructed features. In this work, we empirically compare the predictive accuracies of classification tasks based on the proposed feature construction methods and also the existing feature construction methods. The experimental results show that the predictive accuracy of classifying data that are summarized based on VLFCWS method using Total Cluster Entropy combined with Information Gain (CE-JG) as feature scoring outperforms in most cases.
first_indexed 2024-03-06T03:02:18Z
format Research Report
id ums.eprints-24680
institution Universiti Malaysia Sabah
language English
last_indexed 2024-03-06T03:02:18Z
publishDate 2015
publisher Universiti Malaysia Sabah
record_format dspace
spelling ums.eprints-246802020-01-29T02:41:38Z https://eprints.ums.edu.my/id/eprint/24680/ An evolutionary based features construction methods for data summarization approach Rayner Alfred Suraya Alias Chin, Kim On QA Mathematics Coral reefs are on course to become the first ecosystem that human activity will eliminate entirely from the Earth, a leading United Nations scientist claims. It is predicted that this event will occur before the end of the present century, which means that there are children already born who will live to see a world without coral. Coral reefs are important for the immense biodiversity of their ecosystems. They contain a quarter of all marine species. This research addresses the question whether a data summarization approach can be utilized to predict the survival of Coral Reefs in Malaysia by identifying the survival factors for these Coral Reefs. A data summarization approach is proposed due to its capability to learn data stored in multiple tables. In other words, this research will discuss the application of genetic algorithm to optimize the feature construction process from the Coral Reefs data to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). The DARA algorithm will be applied to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non­target tables correspond to a single record i,tored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm.This research proposes novel feature construction methods, called Variable Length Feature Construction without Substitution (VLFCWOS) and Variable Length Feature Construction with Substitution(VLFCWS), in order to construct a set of relevant features in learning relational data. These methods are proposed to improve the descriptive accuracy of the summarized data. In the process of summarizing relational data, a genetic algorithm is also applied and several feature scoring measures are evaluated in order to find the best set of relevant constructed features. In this work, we empirically compare the predictive accuracies of classification tasks based on the proposed feature construction methods and also the existing feature construction methods. The experimental results show that the predictive accuracy of classifying data that are summarized based on VLFCWS method using Total Cluster Entropy combined with Information Gain (CE-JG) as feature scoring outperforms in most cases. Universiti Malaysia Sabah 2015 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/24680/1/An%20evolutionary%20based%20features%20construction%20methods%20for%20data%20%20summarization%20approach.pdf Rayner Alfred and Suraya Alias and Chin, Kim On (2015) An evolutionary based features construction methods for data summarization approach. (Unpublished)
spellingShingle QA Mathematics
Rayner Alfred
Suraya Alias
Chin, Kim On
An evolutionary based features construction methods for data summarization approach
title An evolutionary based features construction methods for data summarization approach
title_full An evolutionary based features construction methods for data summarization approach
title_fullStr An evolutionary based features construction methods for data summarization approach
title_full_unstemmed An evolutionary based features construction methods for data summarization approach
title_short An evolutionary based features construction methods for data summarization approach
title_sort evolutionary based features construction methods for data summarization approach
topic QA Mathematics
url https://eprints.ums.edu.my/id/eprint/24680/1/An%20evolutionary%20based%20features%20construction%20methods%20for%20data%20%20summarization%20approach.pdf
work_keys_str_mv AT rayneralfred anevolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach
AT surayaalias anevolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach
AT chinkimon anevolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach
AT rayneralfred evolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach
AT surayaalias evolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach
AT chinkimon evolutionarybasedfeaturesconstructionmethodsfordatasummarizationapproach