Cost-based analyses of random neighbor and derived sampling methods

Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has ana...

Full description

Bibliographic Details
Main Authors:	Yitzchak Novick, Amotz Bar-Noy
Format:	Article
Language:	English
Published:	SpringerOpen 2022-06-01
Series:	Applied Network Science
Subjects:	Fair cost comparison Random neighbor sampling High-degree vertex sampling
Online Access:	https://doi.org/10.1007/s41109-022-00475-x

_version_	1818161695781027840
author	Yitzchak Novick Amotz Bar-Noy
author_facet	Yitzchak Novick Amotz Bar-Noy
author_sort	Yitzchak Novick
collection	DOAJ
description	Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.
first_indexed	2024-12-11T16:21:52Z
format	Article
id	doaj.art-8930a9b4b40c4d91a239d87f087e273e
institution	Directory Open Access Journal
issn	2364-8228
language	English
last_indexed	2024-12-11T16:21:52Z
publishDate	2022-06-01
publisher	SpringerOpen
record_format	Article
series	Applied Network Science
spelling	doaj.art-8930a9b4b40c4d91a239d87f087e273e2022-12-22T00:58:50ZengSpringerOpenApplied Network Science2364-82282022-06-017112310.1007/s41109-022-00475-xCost-based analyses of random neighbor and derived sampling methodsYitzchak Novick0Amotz Bar-Noy1Computer Science Department, City University of New York Graduate CenterComputer Science Department, Brooklyn College and Graduate Center, City University of New YorkAbstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.https://doi.org/10.1007/s41109-022-00475-xFair cost comparisonRandom neighbor samplingHigh-degree vertex sampling
spellingShingle	Yitzchak Novick Amotz Bar-Noy Cost-based analyses of random neighbor and derived sampling methods Applied Network Science Fair cost comparison Random neighbor sampling High-degree vertex sampling
title	Cost-based analyses of random neighbor and derived sampling methods
title_full	Cost-based analyses of random neighbor and derived sampling methods
title_fullStr	Cost-based analyses of random neighbor and derived sampling methods
title_full_unstemmed	Cost-based analyses of random neighbor and derived sampling methods
title_short	Cost-based analyses of random neighbor and derived sampling methods
title_sort	cost based analyses of random neighbor and derived sampling methods
topic	Fair cost comparison Random neighbor sampling High-degree vertex sampling
url	https://doi.org/10.1007/s41109-022-00475-x
work_keys_str_mv	AT yitzchaknovick costbasedanalysesofrandomneighborandderivedsamplingmethods AT amotzbarnoy costbasedanalysesofrandomneighborandderivedsamplingmethods

Cost-based analyses of random neighbor and derived sampling methods

Similar Items