Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling

We study the problem of estimating the value of sums of the form S[subscript p]≜∑([x[subscript i] over p]) when one has the ability to sample x[subscript i]≥0 with probability proportional to its magnitude. When p=2 , this problem is equivalent to estimating the selectivity of a self-join query...

Full description

Bibliographic Details
Main Authors: Aliakbarpour, Maryam, Biswas, Amartya Shankha, Gouleakis, Themistoklis, Peebles, John Lee Thompson, Yodpinyanee, Anak, Rubinfeld, Ronitt
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Springer US 2018
Online Access:http://hdl.handle.net/1721.1/115241
https://orcid.org/0000-0001-5064-3221
https://orcid.org/0000-0002-4056-0489
https://orcid.org/0000-0002-6514-3761
https://orcid.org/0000-0002-3466-6543
https://orcid.org/0000-0002-4353-7639
_version_ 1826215598876524544
author Aliakbarpour, Maryam
Biswas, Amartya Shankha
Gouleakis, Themistoklis
Peebles, John Lee Thompson
Yodpinyanee, Anak
Rubinfeld, Ronitt
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Aliakbarpour, Maryam
Biswas, Amartya Shankha
Gouleakis, Themistoklis
Peebles, John Lee Thompson
Yodpinyanee, Anak
Rubinfeld, Ronitt
author_sort Aliakbarpour, Maryam
collection MIT
description We study the problem of estimating the value of sums of the form S[subscript p]≜∑([x[subscript i] over p]) when one has the ability to sample x[subscript i]≥0 with probability proportional to its magnitude. When p=2 , this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {x[subscript i]} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1 ± ε) -multiplicative approximation of S[subscript p] has query and time complexities O(mloglogn/ϵ[superscript 2]S[superscript 1/p][subscript p]). Here, m=∑x[subscript i]/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {x[subscript i]} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where S[subscript p]≤n , and p=2 , our upper bound is [~ over O](n/S[superscript 1/2][subscript p]), in contrast to their Ω(n/S[superscript 1/3][subscript p]) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case. Keywords: Subgraphs, Approximate counting, Randomized algorithms, Sublinear-time algorithms
first_indexed 2024-09-23T16:36:27Z
format Article
id mit-1721.1/115241
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T16:36:27Z
publishDate 2018
publisher Springer US
record_format dspace
spelling mit-1721.1/1152412022-09-29T20:19:36Z Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling Aliakbarpour, Maryam Biswas, Amartya Shankha Gouleakis, Themistoklis Peebles, John Lee Thompson Yodpinyanee, Anak Rubinfeld, Ronitt Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Aliakbarpour, Maryam Biswas, Amartya Shankha Gouleakis, Themistoklis Peebles, John Lee Thompson Yodpinyanee, Anak Rubinfeld, Ronitt We study the problem of estimating the value of sums of the form S[subscript p]≜∑([x[subscript i] over p]) when one has the ability to sample x[subscript i]≥0 with probability proportional to its magnitude. When p=2 , this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {x[subscript i]} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1 ± ε) -multiplicative approximation of S[subscript p] has query and time complexities O(mloglogn/ϵ[superscript 2]S[superscript 1/p][subscript p]). Here, m=∑x[subscript i]/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {x[subscript i]} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where S[subscript p]≤n , and p=2 , our upper bound is [~ over O](n/S[superscript 1/2][subscript p]), in contrast to their Ω(n/S[superscript 1/3][subscript p]) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case. Keywords: Subgraphs, Approximate counting, Randomized algorithms, Sublinear-time algorithms National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1217423) National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1065125) National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1420692) National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1122374) 2018-05-07T14:43:01Z 2018-05-07T14:43:01Z 2017-02 2018-02-03T06:57:29Z Article http://purl.org/eprint/type/JournalArticle 0178-4617 1432-0541 http://hdl.handle.net/1721.1/115241 Aliakbarpour, Maryam, et al. “Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling.” Algorithmica, vol. 80, no. 2, Feb. 2018, pp. 668–97. https://orcid.org/0000-0001-5064-3221 https://orcid.org/0000-0002-4056-0489 https://orcid.org/0000-0002-6514-3761 https://orcid.org/0000-0002-3466-6543 https://orcid.org/0000-0002-4353-7639 en http://dx.doi.org/10.1007/s00453-017-0287-3 Algorithmica Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ Springer Science+Business Media New York application/pdf Springer US Springer US
spellingShingle Aliakbarpour, Maryam
Biswas, Amartya Shankha
Gouleakis, Themistoklis
Peebles, John Lee Thompson
Yodpinyanee, Anak
Rubinfeld, Ronitt
Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title_full Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title_fullStr Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title_full_unstemmed Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title_short Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
title_sort sublinear time algorithms for counting star subgraphs via edge sampling
url http://hdl.handle.net/1721.1/115241
https://orcid.org/0000-0001-5064-3221
https://orcid.org/0000-0002-4056-0489
https://orcid.org/0000-0002-6514-3761
https://orcid.org/0000-0002-3466-6543
https://orcid.org/0000-0002-4353-7639
work_keys_str_mv AT aliakbarpourmaryam sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling
AT biswasamartyashankha sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling
AT gouleakisthemistoklis sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling
AT peeblesjohnleethompson sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling
AT yodpinyaneeanak sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling
AT rubinfeldronitt sublineartimealgorithmsforcountingstarsubgraphsviaedgesampling