How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations
Abstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to unders...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2022-06-01
|
Series: | Applied Network Science |
Subjects: | |
Online Access: | https://doi.org/10.1007/s41109-022-00472-0 |
_version_ | 1828732804654956544 |
---|---|
author | Jesse S. Sayles Ryan P. Furey Marilyn R. ten Brink |
author_facet | Jesse S. Sayles Ryan P. Furey Marilyn R. ten Brink |
author_sort | Jesse S. Sayles |
collection | DOAJ |
description | Abstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to understand environmental governance and policy networks while potentially reducing costs and time when compared to traditional survey and interview methods. A key parameter is ‘search depth,’ i.e., how many connected pages within a website to search for information. Existing research uses a variety of depths and no best practices exist, undermining research quality and case study comparability. We therefore analyze how search depth affects SNA data collection among environmental organizations, if results vary when organizations have different objectives, and how search depth affects social network structure. We find that scraping to a depth of three captures the majority of relevant network data regardless of an organization’s focus. Stakeholder identification (i.e., who is in the network) may require less scraping, but this might under-represent network structure (i.e., who is connected). We also discuss how scraping web-pages of local programs of larger organizations may lead to uncertain results and how our work can combine with mixed methods approaches. |
first_indexed | 2024-04-12T18:07:26Z |
format | Article |
id | doaj.art-8d94437d6e1846c4b5bd8221d3c70709 |
institution | Directory Open Access Journal |
issn | 2364-8228 |
language | English |
last_indexed | 2024-04-12T18:07:26Z |
publishDate | 2022-06-01 |
publisher | SpringerOpen |
record_format | Article |
series | Applied Network Science |
spelling | doaj.art-8d94437d6e1846c4b5bd8221d3c707092022-12-22T03:21:57ZengSpringerOpenApplied Network Science2364-82282022-06-017111610.1007/s41109-022-00472-0How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizationsJesse S. Sayles0Ryan P. Furey1Marilyn R. ten Brink2Oak Ridge Institute for Science and Education (ORISE) Fellow Appointed with the U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionOak Ridge Associated Universities (ORAU) Contracted to the U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionU.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Management and Modelling, Atlantic Coastal Environmental Sciences DivisionAbstract Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to understand environmental governance and policy networks while potentially reducing costs and time when compared to traditional survey and interview methods. A key parameter is ‘search depth,’ i.e., how many connected pages within a website to search for information. Existing research uses a variety of depths and no best practices exist, undermining research quality and case study comparability. We therefore analyze how search depth affects SNA data collection among environmental organizations, if results vary when organizations have different objectives, and how search depth affects social network structure. We find that scraping to a depth of three captures the majority of relevant network data regardless of an organization’s focus. Stakeholder identification (i.e., who is in the network) may require less scraping, but this might under-represent network structure (i.e., who is connected). We also discuss how scraping web-pages of local programs of larger organizations may lead to uncertain results and how our work can combine with mixed methods approaches.https://doi.org/10.1007/s41109-022-00472-0Social network analysisHyperlink networksWeb-scrapingEnvironmental governanceDecision support toolsEnvironmental stewardship |
spellingShingle | Jesse S. Sayles Ryan P. Furey Marilyn R. ten Brink How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations Applied Network Science Social network analysis Hyperlink networks Web-scraping Environmental governance Decision support tools Environmental stewardship |
title | How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations |
title_full | How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations |
title_fullStr | How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations |
title_full_unstemmed | How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations |
title_short | How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations |
title_sort | how deep to dig effects of web scraping search depth on hyperlink network analysis of environmental stewardship organizations |
topic | Social network analysis Hyperlink networks Web-scraping Environmental governance Decision support tools Environmental stewardship |
url | https://doi.org/10.1007/s41109-022-00472-0 |
work_keys_str_mv | AT jessessayles howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations AT ryanpfurey howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations AT marilynrtenbrink howdeeptodigeffectsofwebscrapingsearchdepthonhyperlinknetworkanalysisofenvironmentalstewardshiporganizations |