Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors

Genomic and environmental sciences represent two poles of scientific data. In the first, highly parallel sequencing facilities generate large quantities of sequence data. In the latter, loosely networked remote and field sensors produce intermittent streams of different data types. Yet both genomic...

Full description

Bibliographic Details
Main Authors:	Ruth McNally, Adrian Mackenzie, Allison Hui, Jennifer Tomomitsu
Format:	Article
Language:	English
Published:	University of Edinburgh 2012-03-01
Series:	International Journal of Digital Curation
Online Access:	https://129.215.67.1/ijdc/article/view/216

_version_	1797393113335988224
author	Ruth McNally Adrian Mackenzie Allison Hui Jennifer Tomomitsu
author_facet	Ruth McNally Adrian Mackenzie Allison Hui Jennifer Tomomitsu
author_sort	Ruth McNally
collection	DOAJ
description	Genomic and environmental sciences represent two poles of scientific data. In the first, highly parallel sequencing facilities generate large quantities of sequence data. In the latter, loosely networked remote and field sensors produce intermittent streams of different data types. Yet both genomic and environmental sciences are said to be moving to data intensive research. This paper explores and contrasts data flow in these two domains in order to better understand how data intensive research is being done. Our case studies are next generation sequencing for genomics and environmental networked sensors. Our objective was to enrich understanding of the â€˜intensiveâ€™ processes and properties of data intensive research through a â€˜sociologyâ€™ of data using methods that capture the relational properties of data flows. Our key methodological innovation was the staging of events for practitioners with different kinds of expertise in data intensive research to participate in the collective annotation of visual forms. Through such events we built a substantial digital data archive of our own that we then analysed in terms of three traits of data flow: durability, replicability and metrology. Our findings are that analysing data flow with respect to these three traits provides better insight into how doing data intensive research involves people, infrastructures, practices, things, knowledge and institutions. Collectively, these elements shape the topography of data and condition how it flows. We argue that although much attention is given to phenomena such as the scale, volume and speed of data in data intensive research, these are measures of what we call â€˜extensiveâ€™ properties rather than intensive ones. Our thesis is that extensive changes, that is to say those that result in non-linear changes in metrics, can be seen to result from intensive changes that bring multiple, disparate flows into confluence. If extensive shifts in the modalities of data flow do indeed come from the alignment of disparate things, as we suggest, then we advocate the staging of workshops and other events with the purpose of developing the â€˜missingâ€™ metrics of data flow.
first_indexed	2024-03-08T23:58:26Z
format	Article
id	doaj.art-26a6c713ea654796bb2af6c9cac19cf7
institution	Directory Open Access Journal
issn	1746-8256
language	English
last_indexed	2024-03-08T23:58:26Z
publishDate	2012-03-01
publisher	University of Edinburgh
record_format	Article
series	International Journal of Digital Curation
spelling	doaj.art-26a6c713ea654796bb2af6c9cac19cf72023-12-12T23:52:07ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562012-03-0171Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked SensorsRuth McNallyAdrian MackenzieAllison HuiJennifer TomomitsuGenomic and environmental sciences represent two poles of scientific data. In the first, highly parallel sequencing facilities generate large quantities of sequence data. In the latter, loosely networked remote and field sensors produce intermittent streams of different data types. Yet both genomic and environmental sciences are said to be moving to data intensive research. This paper explores and contrasts data flow in these two domains in order to better understand how data intensive research is being done. Our case studies are next generation sequencing for genomics and environmental networked sensors. Our objective was to enrich understanding of the â€˜intensiveâ€™ processes and properties of data intensive research through a â€˜sociologyâ€™ of data using methods that capture the relational properties of data flows. Our key methodological innovation was the staging of events for practitioners with different kinds of expertise in data intensive research to participate in the collective annotation of visual forms. Through such events we built a substantial digital data archive of our own that we then analysed in terms of three traits of data flow: durability, replicability and metrology. Our findings are that analysing data flow with respect to these three traits provides better insight into how doing data intensive research involves people, infrastructures, practices, things, knowledge and institutions. Collectively, these elements shape the topography of data and condition how it flows. We argue that although much attention is given to phenomena such as the scale, volume and speed of data in data intensive research, these are measures of what we call â€˜extensiveâ€™ properties rather than intensive ones. Our thesis is that extensive changes, that is to say those that result in non-linear changes in metrics, can be seen to result from intensive changes that bring multiple, disparate flows into confluence. If extensive shifts in the modalities of data flow do indeed come from the alignment of disparate things, as we suggest, then we advocate the staging of workshops and other events with the purpose of developing the â€˜missingâ€™ metrics of data flow.https://129.215.67.1/ijdc/article/view/216
spellingShingle	Ruth McNally Adrian Mackenzie Allison Hui Jennifer Tomomitsu Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors International Journal of Digital Curation
title	Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors
title_full	Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors
title_fullStr	Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors
title_full_unstemmed	Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors
title_short	Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors
title_sort	understanding the a€ intensivea€™ in a€ data intensive researcha€™ data flows in next generation sequencing and environmental networked sensors
url	https://129.215.67.1/ijdc/article/view/216
work_keys_str_mv	AT ruthmcnally understandingtheaintensiveainadataintensiveresearchadataflowsinnextgenerationsequencingandenvironmentalnetworkedsensors AT adrianmackenzie understandingtheaintensiveainadataintensiveresearchadataflowsinnextgenerationsequencingandenvironmentalnetworkedsensors AT allisonhui understandingtheaintensiveainadataintensiveresearchadataflowsinnextgenerationsequencingandenvironmentalnetworkedsensors AT jennifertomomitsu understandingtheaintensiveainadataintensiveresearchadataflowsinnextgenerationsequencingandenvironmentalnetworkedsensors

Understanding the â€˜Intensiveâ€™ in â€˜Data Intensive Researchâ€™: Data Flows in Next Generation Sequencing and Environmental Networked Sensors

Similar Items