Cost-based analyses of random neighbor and derived sampling methods

Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has ana...

Full description

Bibliographic Details
Main Authors: Yitzchak Novick, Amotz Bar-Noy
Format: Article
Language:English
Published: SpringerOpen 2022-06-01
Series:Applied Network Science
Subjects:
Online Access:https://doi.org/10.1007/s41109-022-00475-x
_version_ 1818161695781027840
author Yitzchak Novick
Amotz Bar-Noy
author_facet Yitzchak Novick
Amotz Bar-Noy
author_sort Yitzchak Novick
collection DOAJ
description Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.
first_indexed 2024-12-11T16:21:52Z
format Article
id doaj.art-8930a9b4b40c4d91a239d87f087e273e
institution Directory Open Access Journal
issn 2364-8228
language English
last_indexed 2024-12-11T16:21:52Z
publishDate 2022-06-01
publisher SpringerOpen
record_format Article
series Applied Network Science
spelling doaj.art-8930a9b4b40c4d91a239d87f087e273e2022-12-22T00:58:50ZengSpringerOpenApplied Network Science2364-82282022-06-017112310.1007/s41109-022-00475-xCost-based analyses of random neighbor and derived sampling methodsYitzchak Novick0Amotz Bar-Noy1Computer Science Department, City University of New York Graduate CenterComputer Science Department, Brooklyn College and Graduate Center, City University of New YorkAbstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.https://doi.org/10.1007/s41109-022-00475-xFair cost comparisonRandom neighbor samplingHigh-degree vertex sampling
spellingShingle Yitzchak Novick
Amotz Bar-Noy
Cost-based analyses of random neighbor and derived sampling methods
Applied Network Science
Fair cost comparison
Random neighbor sampling
High-degree vertex sampling
title Cost-based analyses of random neighbor and derived sampling methods
title_full Cost-based analyses of random neighbor and derived sampling methods
title_fullStr Cost-based analyses of random neighbor and derived sampling methods
title_full_unstemmed Cost-based analyses of random neighbor and derived sampling methods
title_short Cost-based analyses of random neighbor and derived sampling methods
title_sort cost based analyses of random neighbor and derived sampling methods
topic Fair cost comparison
Random neighbor sampling
High-degree vertex sampling
url https://doi.org/10.1007/s41109-022-00475-x
work_keys_str_mv AT yitzchaknovick costbasedanalysesofrandomneighborandderivedsamplingmethods
AT amotzbarnoy costbasedanalysesofrandomneighborandderivedsamplingmethods