World Wide Web resource discovery

Query routing refers to the general resource discovery problem of selecting from a large set of accessible information sources the ones relevant to a given query (database selection), evaluating the query on the selected sources (query evaluation), and merging their results (result merging). As the...

Full description

Bibliographic Details
Main Author:	Xu, Jian.
Other Authors:	Lim Ee Peng
Format:	Thesis
Language:	English
Published:	2010
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Online Access:	http://hdl.handle.net/10356/42555

_version_	1826109585304322048
author	Xu, Jian.
author2	Lim Ee Peng
author_facet	Lim Ee Peng Xu, Jian.
author_sort	Xu, Jian.
collection	NTU
description	Query routing refers to the general resource discovery problem of selecting from a large set of accessible information sources the ones relevant to a given query (database selection), evaluating the query on the selected sources (query evaluation), and merging their results (result merging). As the number of information sources on the Internet increases dramatically, query routing is becoming increasingly important. Nevertheless, much of the previous work in query routing focused on information sources that are document collections. Moreover, there has been little work done for collections that can be accessed only through some query interfaces. In this project, we focus on the database selection problem, an important subproblem of query routing, for bibliographic databases consisting of multiple text attributes. In particular, we first proposed three training-based database selection techniques known as TQS, TQC and TQG. These three techniques rely on training query results to determine the relevance of databases with respect to a given user query. Our experiments have shown that TQG and TQC outperform TQS for the same number of training queries. We further explored the use of clustering techniques to improve the performance of database selection for bibliographic databases. Three clustering techniques, i.e. Single Pass Clustering (SPC), Reallocation Clustering (RC) and Constrained Clustering (CC), have been experimented with two database ranking schemes know as ERS and EGS. Our experiments showed that any clustering techniques combined with ERS will yield good performance. This research also looked into the implementation of a query routing broker, known as ZBroker, developed for bibliographic database servers supporting Z39.50 query interfaces on the Internet.
first_indexed	2024-10-01T02:20:38Z
format	Thesis
id	ntu-10356/42555
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T02:20:38Z
publishDate	2010
record_format	dspace
spelling	ntu-10356/425552020-09-27T20:13:39Z World Wide Web resource discovery Xu, Jian. Lim Ee Peng School of Applied Science DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Query routing refers to the general resource discovery problem of selecting from a large set of accessible information sources the ones relevant to a given query (database selection), evaluating the query on the selected sources (query evaluation), and merging their results (result merging). As the number of information sources on the Internet increases dramatically, query routing is becoming increasingly important. Nevertheless, much of the previous work in query routing focused on information sources that are document collections. Moreover, there has been little work done for collections that can be accessed only through some query interfaces. In this project, we focus on the database selection problem, an important subproblem of query routing, for bibliographic databases consisting of multiple text attributes. In particular, we first proposed three training-based database selection techniques known as TQS, TQC and TQG. These three techniques rely on training query results to determine the relevance of databases with respect to a given user query. Our experiments have shown that TQG and TQC outperform TQS for the same number of training queries. We further explored the use of clustering techniques to improve the performance of database selection for bibliographic databases. Three clustering techniques, i.e. Single Pass Clustering (SPC), Reallocation Clustering (RC) and Constrained Clustering (CC), have been experimented with two database ranking schemes know as ERS and EGS. Our experiments showed that any clustering techniques combined with ERS will yield good performance. This research also looked into the implementation of a query routing broker, known as ZBroker, developed for bibliographic database servers supporting Z39.50 query interfaces on the Internet. Master of Applied Science 2010-12-30T04:52:25Z 2010-12-30T04:52:25Z 1999 1999 Thesis http://hdl.handle.net/10356/42555 en 141 p. application/pdf
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Xu, Jian. World Wide Web resource discovery
title	World Wide Web resource discovery
title_full	World Wide Web resource discovery
title_fullStr	World Wide Web resource discovery
title_full_unstemmed	World Wide Web resource discovery
title_short	World Wide Web resource discovery
title_sort	world wide web resource discovery
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
url	http://hdl.handle.net/10356/42555
work_keys_str_mv	AT xujian worldwidewebresourcediscovery

World Wide Web resource discovery

Similar Items