Summary: | Abstract Random neighbor sampling, or RN, is a method for sampling vertices with a mean degree greater than that of the graph. Instead of naïvely sampling a vertex from a graph and retaining it (‘random vertex’ or RV), a neighbor of the vertex is selected instead. While considerable research has analyzed various aspects of RN, the extra cost of sampling a second vertex is typically not addressed. This paper explores RN sampling from the perspective of cost. We break down the cost of sampling into two distinct costs, that of sampling a vertex and that of sampling a neighbor of an already sampled vertex, and we also include the cost of actually selecting a vertex/neighbor and retaining it for use rather than discarding it. With these three costs as our cost-model, we explore RN and compare it to RV in a more fair manner than comparisons that have been made in previous research. As we delve into costs, a number of variants to RN are introduced. These variants improve on the cost-effectiveness of RN in regard to particular costs and priorities. Our full cost-benefit analysis highlights strengths and weaknesses of the methods. We particularly focus on how our methods perform for sampling high-degree and low-degree vertices, which further enriches the understanding of the methods and how they can be practically applied. We also suggest ‘two-phase’ methods that specifically seek to cover both high-degree and low-degree vertices in separate sampling phases.
|