Enhance efficiency of answering XML keyword query using incompact structure of MCCTree

People nowadays live in cyber life where everything can be done by just typing through keyboard and system will complete the process. As the interaction is done through online, data sharing is the most important service to send and deliver information. Extended Markup Language (XML) has been chosen...

Full description

Bibliographic Details
Main Author: Sazaly, Ummu Sulaim
Format: Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/38635/1/FSKTM%202013%203.pdf
_version_ 1825949261929381888
author Sazaly, Ummu Sulaim
author_facet Sazaly, Ummu Sulaim
author_sort Sazaly, Ummu Sulaim
collection UPM
description People nowadays live in cyber life where everything can be done by just typing through keyboard and system will complete the process. As the interaction is done through online, data sharing is the most important service to send and deliver information. Extended Markup Language (XML) has been chosen as the most important data sharing medium as it is very friendly for human and machine to interpret. Due to the importance of it, many studies have been done to increase the effectiveness of retrieving information from XML file. Many notions and techniques have been introduced especially to process query of information. Compact Lowest Common Ancestor (CLCA) and Maximal Compact Lowest Common Ancestor (MCLCA) implemented in algorithms named CGTreeGenerator and MCCTreeGenerator has been proven in returning an accurate result in answering XML keyword query. CGTreeGenerator compacted the XML tree by eliminating irrelevant nodes based on CLCA notion, which produced Compact Global Tree (CGTree). MCCTreeGenerator used CGTree to select subtree called Maximal Compact Connected Tree (MCCTree) as query result based on MCLCA notion. However, the MCCTree cannot be used directly in its ranking method because calculation in ranking method used the structure of subtree as before it has been compacted. If the result cannot be used directly by the ranking method, the algorithm has an ineffective process. Moreover, if the ineffective process requires re-examining the original tree, the efficiency of the process of the algorithm will be reduced. This study is a response to these weaknesses. This study proposes a new algorithm, namely XMCCTreeGenerator, to enhance the efficiency of the CGTree- MCCTreeGenerator. This study identifies the effective processes needed in producing XML query result using MCLCA notion and without compacting it. Those processes constructed MCCTreeGenerator algorithm which will produce the same subtree as MCCTree but difference in its structure. This new returned subtree called Extended MCCTree(XMCCTree) can be used directly by the ranking method because it is in an incompact structure. An experiment is run using XML datasets available in XML Data Repository from University of Washington’s website. Two files are selected which consist of different data structure and divided into three ranges of size. Keywords are manually randomly selected from the files and executed between three to five numbers of keyword. Two prototypes are developed which implement CGTree-MCCTreeGenerator and XMCCTreeGenerator. Since this study focuses on efficiency of the algorithm, elapsed time for each execution is collected from the experiment. In conclusion, the proposed XMCCTreeGenerator is more efficient than the previous CGTree- MCCTreeGenerator in answering XML keyword query using MCLCA.
first_indexed 2024-03-06T08:41:51Z
format Thesis
id upm.eprints-38635
institution Universiti Putra Malaysia
language English
last_indexed 2024-09-25T03:33:57Z
publishDate 2012
record_format dspace
spelling upm.eprints-386352024-08-30T07:36:36Z http://psasir.upm.edu.my/id/eprint/38635/ Enhance efficiency of answering XML keyword query using incompact structure of MCCTree Sazaly, Ummu Sulaim People nowadays live in cyber life where everything can be done by just typing through keyboard and system will complete the process. As the interaction is done through online, data sharing is the most important service to send and deliver information. Extended Markup Language (XML) has been chosen as the most important data sharing medium as it is very friendly for human and machine to interpret. Due to the importance of it, many studies have been done to increase the effectiveness of retrieving information from XML file. Many notions and techniques have been introduced especially to process query of information. Compact Lowest Common Ancestor (CLCA) and Maximal Compact Lowest Common Ancestor (MCLCA) implemented in algorithms named CGTreeGenerator and MCCTreeGenerator has been proven in returning an accurate result in answering XML keyword query. CGTreeGenerator compacted the XML tree by eliminating irrelevant nodes based on CLCA notion, which produced Compact Global Tree (CGTree). MCCTreeGenerator used CGTree to select subtree called Maximal Compact Connected Tree (MCCTree) as query result based on MCLCA notion. However, the MCCTree cannot be used directly in its ranking method because calculation in ranking method used the structure of subtree as before it has been compacted. If the result cannot be used directly by the ranking method, the algorithm has an ineffective process. Moreover, if the ineffective process requires re-examining the original tree, the efficiency of the process of the algorithm will be reduced. This study is a response to these weaknesses. This study proposes a new algorithm, namely XMCCTreeGenerator, to enhance the efficiency of the CGTree- MCCTreeGenerator. This study identifies the effective processes needed in producing XML query result using MCLCA notion and without compacting it. Those processes constructed MCCTreeGenerator algorithm which will produce the same subtree as MCCTree but difference in its structure. This new returned subtree called Extended MCCTree(XMCCTree) can be used directly by the ranking method because it is in an incompact structure. An experiment is run using XML datasets available in XML Data Repository from University of Washington’s website. Two files are selected which consist of different data structure and divided into three ranges of size. Keywords are manually randomly selected from the files and executed between three to five numbers of keyword. Two prototypes are developed which implement CGTree-MCCTreeGenerator and XMCCTreeGenerator. Since this study focuses on efficiency of the algorithm, elapsed time for each execution is collected from the experiment. In conclusion, the proposed XMCCTreeGenerator is more efficient than the previous CGTree- MCCTreeGenerator in answering XML keyword query using MCLCA. 2012-11 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/38635/1/FSKTM%202013%203.pdf Sazaly, Ummu Sulaim (2012) Enhance efficiency of answering XML keyword query using incompact structure of MCCTree. Masters thesis, Universiti Putra Malaysia. XML (Document markup language) Keyword searching
spellingShingle XML (Document markup language)
Keyword searching
Sazaly, Ummu Sulaim
Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title_full Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title_fullStr Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title_full_unstemmed Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title_short Enhance efficiency of answering XML keyword query using incompact structure of MCCTree
title_sort enhance efficiency of answering xml keyword query using incompact structure of mcctree
topic XML (Document markup language)
Keyword searching
url http://psasir.upm.edu.my/id/eprint/38635/1/FSKTM%202013%203.pdf
work_keys_str_mv AT sazalyummusulaim enhanceefficiencyofansweringxmlkeywordqueryusingincompactstructureofmcctree