Summary: | Querying structured information through keyword queries provides an easy way to get to the information without knowing the structural details of the underlying data for formulating formal queries and without posing correct grammatical questions to the user interface. Besides the obvious advantages of keyword querying, it lacks expressiveness in contrast to syntactic questions. The problems faced by keyword queries lie in the fact that the processing capability is restricted to the posed keywords, additional connecting words and relations among keywords are ignored. semi-structured data like RDF, relations are formally defined as properties among concepts. This helps the keyword querying in finding connections among concepts from underlying data. But instead of this facility, the NLIs results lack in precision and relevance. One major reason for this lacking is that more work is done in increasing efficiency with respect to data storage, data indexing and reporting results using top-k strategies. Less work is performed in the direction of enhancing expressiveness, supporting lengthy queries and answering the queries with relevance oriented ranking.We are concerned with enhancing the keyword query processing model in terms of handling expressive keyword queries and syntactic questions that incorporates quantifier restrictions and AND-OR semantics on RDF knowledge bases. The process of manipulating both type of natural language (NL) queries are supported by Ontologies. These NL queries are converted to target queries for result retrieval from RDF. The generated target queries are required to be ranked so that the results are reported in order to their relevance to the user query. To handle large keyword queries, graph representation and processing is considered as a bottleneck. We preprocessed the RDF graph to be stored in distributed manner after the elimination of single chain productions in order to increase the efficiency in conversion process. We used the shortest path algorithms to be called on certain resources to explore connectivity to reduce complexity of search. For the generality of target query representation and to incorporate quantifiers, subclasses and sub-class unions, we define an extended representation of the conjunctive query, termed as extended conjunctive query. But for the implementation of user query AND-OR semantics and semantic ranking, we define an efficient representation, termed as compact Boolean query (CBQ). Empty result conditions reported by some approaches are also handled with the CBQ. For the problem of conversion, techniques with fixed templates face scalability problems; while graph only techniques are processing intensive. We propose a variable template based conversion with inexpensive graph techniques to handle lengthy queries and exploring indirect connectivity among elements. Considering the ranking problem, relevance ranking comprising of co-occurrence and Boolean semantics is proposed to help in understanding keyword queries and syntactic questions for precise answering. Experimental results applied on LUBM, Mooney and self developed ontologies have shown that our technique can handle queries of 19 keywords within bearable time limits. The CBQ provides complete solution for empty results condition for correctly transformed queries. The coverage of queries is extended to understand queries originated from syntactic questions with improved precision. The improvement in values of MRR and TQP reflects the potential of our designed co-occurrence and AND-OR ranking strategies in placing the most relevant target queries at top positions.
|