Revisiting the Data Lifecycle with Big Data Curation

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the dev...

Full description

Bibliographic Details
Main Author: Line Pouchard
Format: Article
Language:English
Published: University of Edinburgh 2016-05-01
Series:International Journal of Digital Curation
Online Access:https://ijdc.net/index.php/ijdc/article/view/342
_version_ 1797323893155823616
author Line Pouchard
author_facet Line Pouchard
author_sort Line Pouchard
collection DOAJ
description As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project
first_indexed 2024-03-08T05:34:41Z
format Article
id doaj.art-221554fefbd24c9cb26a45e282965fc2
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-08T05:34:41Z
publishDate 2016-05-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-221554fefbd24c9cb26a45e282965fc22024-02-06T00:06:24ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562016-05-01102Revisiting the Data Lifecycle with Big Data CurationLine Pouchard As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project https://ijdc.net/index.php/ijdc/article/view/342
spellingShingle Line Pouchard
Revisiting the Data Lifecycle with Big Data Curation
International Journal of Digital Curation
title Revisiting the Data Lifecycle with Big Data Curation
title_full Revisiting the Data Lifecycle with Big Data Curation
title_fullStr Revisiting the Data Lifecycle with Big Data Curation
title_full_unstemmed Revisiting the Data Lifecycle with Big Data Curation
title_short Revisiting the Data Lifecycle with Big Data Curation
title_sort revisiting the data lifecycle with big data curation
url https://ijdc.net/index.php/ijdc/article/view/342
work_keys_str_mv AT linepouchard revisitingthedatalifecyclewithbigdatacuration