How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations

Abstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to unders...

Full description

Bibliographic Details
Main Authors: Jesse S. Sayles, Ryan P. Furey, Marilyn R. ten Brink
Format: Article
Language:English
Published: SpringerOpen 2022-06-01
Series:Applied Network Science
Subjects:
Online Access:https://doi.org/10.1007/s41109-022-00472-0
_version_ 1828732804654956544
author Jesse S. Sayles
Ryan P. Furey
Marilyn R. ten Brink
author_facet Jesse S. Sayles
Ryan P. Furey
Marilyn R. ten Brink
author_sort Jesse S. Sayles
collection DOAJ
description Abstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to understand environmental governance and policy networks while potentially reducing costs and time when compared to traditional survey and interview methods. A key parameter is ‘search depth,’ i.e., how many connected pages within a website to search for information. Existing research uses a variety of depths and no best practices exist, undermining research quality and case study comparability. We therefore analyze how search depth affects SNA data collection among environmental organizations, if results vary when organizations have different objectives, and how search depth affects social network structure. We find that scraping to a depth of three captures the majority of relevant network data regardless of an organization’s focus. Stakeholder identification (i.e., who is in the network) may require less scraping, but this might under-represent network structure (i.e., who is connected). We also discuss how scraping web-pages of local programs of larger organizations may lead to uncertain results and how our work can combine with mixed methods approaches.
first_indexed 2024-04-12T18:07:26Z
format Article
id doaj.art-8d94437d6e1846c4b5bd8221d3c70709
institution Directory Open Access Journal
issn 2364-8228
language English
last_indexed 2024-04-12T18:07:26Z
publishDate 2022-06-01
publisher SpringerOpen
record_format Article
series Applied Network Science
spelling doaj.art-8d94437d6e1846c4b5bd8221d3c707092022-12-22T03:21:57ZengSpringerOpenApplied Network Science2364-82282022-06-017111610.1007/s41109-022-00472-0How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizationsJesse S. Sayles0Ryan P. Furey1Marilyn R. ten Brink2Oak Ridge Institute for Science and Education (ORISE) Fellow Appointed with the U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionOak Ridge Associated Universities (ORAU) Contracted to the U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionU.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionAbstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to understand environmental governance and policy networks while potentially reducing costs and time when compared to traditional survey and interview methods. A key parameter is ‘search depth,’ i.e., how many connected pages within a website to search for information. Existing research uses a variety of depths and no best practices exist, undermining research quality and case study comparability. We therefore analyze how search depth affects SNA data collection among environmental organizations, if results vary when organizations have different objectives, and how search depth affects social network structure. We find that scraping to a depth of three captures the majority of relevant network data regardless of an organization’s focus. Stakeholder identification (i.e., who is in the network) may require less scraping, but this might under-represent network structure (i.e., who is connected). We also discuss how scraping web-pages of local programs of larger organizations may lead to uncertain results and how our work can combine with mixed methods approaches.https://doi.org/10.1007/s41109-022-00472-0Social network analysisHyperlink networksWeb-scrapingEnvironmental governanceDecision support toolsEnvironmental stewardship
spellingShingle Jesse S. Sayles
Ryan P. Furey
Marilyn R. ten Brink
How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
Applied Network Science
Social network analysis
Hyperlink networks
Web-scraping
Environmental governance
Decision support tools
Environmental stewardship
title How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
title_full How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
title_fullStr How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
title_full_unstemmed How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
title_short How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
title_sort how deep to dig effects of web scraping search depth on hyperlink network analysis of environmental stewardship organizations
topic Social network analysis
Hyperlink networks
Web-scraping
Environmental governance
Decision support tools
Environmental stewardship
url https://doi.org/10.1007/s41109-022-00472-0
work_keys_str_mv AT jessessayles howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations
AT ryanpfurey howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations
AT marilynrtenbrink howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations