Optimal algorithms for selecting top-k combinations of attributes : theory and applications

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and l...

Full description

Bibliographic Details
Main Authors: Lin, Chunbin, Lu, Jiaheng, Wei, Zhewei, Wang, Jianguo, Xiao, Xiaokui
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142480
_version_ 1811683722767892480
author Lin, Chunbin
Lu, Jiaheng
Wei, Zhewei
Wang, Jianguo
Xiao, Xiaokui
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Lin, Chunbin
Lu, Jiaheng
Wei, Zhewei
Wang, Jianguo
Xiao, Xiaokui
author_sort Lin, Chunbin
collection NTU
description Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
first_indexed 2024-10-01T04:17:15Z
format Journal Article
id ntu-10356/142480
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:17:15Z
publishDate 2020
record_format dspace
spelling ntu-10356/1424802020-06-22T09:35:32Z Optimal algorithms for selecting top-k combinations of attributes : theory and applications Lin, Chunbin Lu, Jiaheng Wei, Zhewei Wang, Jianguo Xiao, Xiaokui School of Computer Science and Engineering Engineering::Computer science and engineering Top-k Query Top-k, M Query Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets. MOE (Min. of Education, S’pore) 2020-06-22T09:35:32Z 2020-06-22T09:35:32Z 2017 Journal Article Lin, C., Lu, J., Wei, Z., Wang, J., & Xiao, X. (2018). Optimal algorithms for selecting top-k combinations of attributes : theory and applications. The VLDB Journal, 27(1), 27-52. doi:10.1007/s00778-017-0485-2 1066-8888 https://hdl.handle.net/10356/142480 10.1007/s00778-017-0485-2 2-s2.0-85032388214 1 27 27 52 en The VLDB Journal © 2017 Springer-Verlag GmbH Germany. All rights reserved.
spellingShingle Engineering::Computer science and engineering
Top-k Query
Top-k, M Query
Lin, Chunbin
Lu, Jiaheng
Wei, Zhewei
Wang, Jianguo
Xiao, Xiaokui
Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title_full Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title_fullStr Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title_full_unstemmed Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title_short Optimal algorithms for selecting top-k combinations of attributes : theory and applications
title_sort optimal algorithms for selecting top k combinations of attributes theory and applications
topic Engineering::Computer science and engineering
Top-k Query
Top-k, M Query
url https://hdl.handle.net/10356/142480
work_keys_str_mv AT linchunbin optimalalgorithmsforselectingtopkcombinationsofattributestheoryandapplications
AT lujiaheng optimalalgorithmsforselectingtopkcombinationsofattributestheoryandapplications
AT weizhewei optimalalgorithmsforselectingtopkcombinationsofattributestheoryandapplications
AT wangjianguo optimalalgorithmsforselectingtopkcombinationsofattributestheoryandapplications
AT xiaoxiaokui optimalalgorithmsforselectingtopkcombinationsofattributestheoryandapplications