Cost-based analyses of random neighbor and derived sampling methods
Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has ana...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2022-06-01
|
Series: | Applied Network Science |
Subjects: | |
Online Access: | https://doi.org/10.1007/s41109-022-00475-x |
_version_ | 1818161695781027840 |
---|---|
author | Yitzchak Novick Amotz Bar-Noy |
author_facet | Yitzchak Novick Amotz Bar-Noy |
author_sort | Yitzchak Novick |
collection | DOAJ |
description | Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases. |
first_indexed | 2024-12-11T16:21:52Z |
format | Article |
id | doaj.art-8930a9b4b40c4d91a239d87f087e273e |
institution | Directory Open Access Journal |
issn | 2364-8228 |
language | English |
last_indexed | 2024-12-11T16:21:52Z |
publishDate | 2022-06-01 |
publisher | SpringerOpen |
record_format | Article |
series | Applied Network Science |
spelling | doaj.art-8930a9b4b40c4d91a239d87f087e273e2022-12-22T00:58:50ZengSpringerOpenApplied Network Science2364-82282022-06-017112310.1007/s41109-022-00475-xCost-based analyses of random neighbor and derived sampling methodsYitzchak Novick0Amotz Bar-Noy1Computer Science Department, City University of New York Graduate CenterComputer Science Department, Brooklyn College and Graduate Center, City University of New YorkAbstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.https://doi.org/10.1007/s41109-022-00475-xFair cost comparisonRandom neighbor samplingHigh-degree vertex sampling |
spellingShingle | Yitzchak Novick Amotz Bar-Noy Cost-based analyses of random neighbor and derived sampling methods Applied Network Science Fair cost comparison Random neighbor sampling High-degree vertex sampling |
title | Cost-based analyses of random neighbor and derived sampling methods |
title_full | Cost-based analyses of random neighbor and derived sampling methods |
title_fullStr | Cost-based analyses of random neighbor and derived sampling methods |
title_full_unstemmed | Cost-based analyses of random neighbor and derived sampling methods |
title_short | Cost-based analyses of random neighbor and derived sampling methods |
title_sort | cost based analyses of random neighbor and derived sampling methods |
topic | Fair cost comparison Random neighbor sampling High-degree vertex sampling |
url | https://doi.org/10.1007/s41109-022-00475-x |
work_keys_str_mv | AT yitzchaknovick costbasedanalysesofrandomneighborandderivedsamplingmethods AT amotzbarnoy costbasedanalysesofrandomneighborandderivedsamplingmethods |