Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest

Abstract As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succee...

Full description

Bibliographic Details
Main Authors: Stephen L. Katz, Katie A. Barnas, Monica Diaz, Stephanie E. Hampton
Format: Article
Language:English
Published: Wiley 2019-11-01
Series:Ecosphere
Subjects:
Online Access:https://doi.org/10.1002/ecs2.2920
_version_ 1818910827272470528
author Stephen L. Katz
Katie A. Barnas
Monica Diaz
Stephanie E. Hampton
author_facet Stephen L. Katz
Katie A. Barnas
Monica Diaz
Stephanie E. Hampton
author_sort Stephen L. Katz
collection DOAJ
description Abstract As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succeeding in that assembly, and much informatics research is focused on data federation, or synthesis to produce interoperable data. However, when candidate data come from numerous, diverse, and high‐value legacy data sources, the issue of data variety or heterogeneity can be a significant impediment to interoperability. Research in informatics, computer science and philosophy has frequently focused on resolving data heterogeneity with automation, but subject matter expertise still plays a large role. In particular, human expertise is a large component in the development of tools such as data dictionaries, crosswalks, and ontologies. Such representations may not always match from one data system to another, presenting potentially inconsistent results even with the same data. Here, we use a long‐term data set on management actions designed to improve stream habitat for endangered salmon in the Pacific Northwest, to illustrate how different representations can change the underlying information content in the data system. We pass the same data set comprised of 49,619 records through three ontologies, each developed to address a rational management need, and show that the inferences drawn from the data can change with choice of data representation or ontology. One striking example shows that the use of one ontology would suggest water quality improvement projects are the rarest and most expensive restoration actions undertaken, while another will suggest these actions to be the most common and least expensive type of management actions. The discrepancy relates to the origins of the data dictionaries themselves, with one designed to catalog management actions and the other focused on ecological processes. Thus, we argue that in data federation efforts humans are “in the loop” rationally, in the form of the ontologies they have chosen, and diminishing the human component in favor of automation carries risks. Consequently, data federation exercises should be accompanied by validations in order to evaluate and manage those risks.
first_indexed 2024-12-19T22:48:59Z
format Article
id doaj.art-08af8e3948b74074a9387a77ccb9b4d0
institution Directory Open Access Journal
issn 2150-8925
language English
last_indexed 2024-12-19T22:48:59Z
publishDate 2019-11-01
publisher Wiley
record_format Article
series Ecosphere
spelling doaj.art-08af8e3948b74074a9387a77ccb9b4d02022-12-21T20:02:53ZengWileyEcosphere2150-89252019-11-011011n/an/a10.1002/ecs2.2920Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific NorthwestStephen L. Katz0Katie A. Barnas1Monica Diaz2Stephanie E. Hampton3School of the Environment Washington State University Pullman Washington 99164 USANorthwest Fisheries Science Center NOAA Fisheries Service Seattle Washington 98112 USANorthwest Fisheries Science Center NOAA Fisheries Service Seattle Washington 98112 USACenter for Environmental Research, Education and Outreach Washington State University Pullman Washington 99164 USAAbstract As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succeeding in that assembly, and much informatics research is focused on data federation, or synthesis to produce interoperable data. However, when candidate data come from numerous, diverse, and high‐value legacy data sources, the issue of data variety or heterogeneity can be a significant impediment to interoperability. Research in informatics, computer science and philosophy has frequently focused on resolving data heterogeneity with automation, but subject matter expertise still plays a large role. In particular, human expertise is a large component in the development of tools such as data dictionaries, crosswalks, and ontologies. Such representations may not always match from one data system to another, presenting potentially inconsistent results even with the same data. Here, we use a long‐term data set on management actions designed to improve stream habitat for endangered salmon in the Pacific Northwest, to illustrate how different representations can change the underlying information content in the data system. We pass the same data set comprised of 49,619 records through three ontologies, each developed to address a rational management need, and show that the inferences drawn from the data can change with choice of data representation or ontology. One striking example shows that the use of one ontology would suggest water quality improvement projects are the rarest and most expensive restoration actions undertaken, while another will suggest these actions to be the most common and least expensive type of management actions. The discrepancy relates to the origins of the data dictionaries themselves, with one designed to catalog management actions and the other focused on ecological processes. Thus, we argue that in data federation efforts humans are “in the loop” rationally, in the form of the ontologies they have chosen, and diminishing the human component in favor of automation carries risks. Consequently, data federation exercises should be accompanied by validations in order to evaluate and manage those risks.https://doi.org/10.1002/ecs2.2920applied epistemologybioinformaticscrosswalkdata confederationdata federationdata synthesis
spellingShingle Stephen L. Katz
Katie A. Barnas
Monica Diaz
Stephanie E. Hampton
Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
Ecosphere
applied epistemology
bioinformatics
crosswalk
data confederation
data federation
data synthesis
title Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
title_full Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
title_fullStr Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
title_full_unstemmed Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
title_short Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest
title_sort data system design alters meaning in ecological data salmon habitat restoration across the u s pacific northwest
topic applied epistemology
bioinformatics
crosswalk
data confederation
data federation
data synthesis
url https://doi.org/10.1002/ecs2.2920
work_keys_str_mv AT stephenlkatz datasystemdesignaltersmeaninginecologicaldatasalmonhabitatrestorationacrosstheuspacificnorthwest
AT katieabarnas datasystemdesignaltersmeaninginecologicaldatasalmonhabitatrestorationacrosstheuspacificnorthwest
AT monicadiaz datasystemdesignaltersmeaninginecologicaldatasalmonhabitatrestorationacrosstheuspacificnorthwest
AT stephanieehampton datasystemdesignaltersmeaninginecologicaldatasalmonhabitatrestorationacrosstheuspacificnorthwest