Who shares? Who doesn't? Factors associated with openly archiving raw research data.

Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with hi...

Full description

Bibliographic Details
Main Author: Heather A Piwowar
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3135593?pdf=render
_version_ 1818996595843137536
author Heather A Piwowar
author_facet Heather A Piwowar
author_sort Heather A Piwowar
collection DOAJ
description Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.
first_indexed 2024-12-20T21:32:15Z
format Article
id doaj.art-5882ace59186417f8d2f9e09ec18b292
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-20T21:32:15Z
publishDate 2011-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-5882ace59186417f8d2f9e09ec18b2922022-12-21T19:26:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0167e1865710.1371/journal.pone.0018657Who shares? Who doesn't? Factors associated with openly archiving raw research data.Heather A PiwowarMany initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.http://europepmc.org/articles/PMC3135593?pdf=render
spellingShingle Heather A Piwowar
Who shares? Who doesn't? Factors associated with openly archiving raw research data.
PLoS ONE
title Who shares? Who doesn't? Factors associated with openly archiving raw research data.
title_full Who shares? Who doesn't? Factors associated with openly archiving raw research data.
title_fullStr Who shares? Who doesn't? Factors associated with openly archiving raw research data.
title_full_unstemmed Who shares? Who doesn't? Factors associated with openly archiving raw research data.
title_short Who shares? Who doesn't? Factors associated with openly archiving raw research data.
title_sort who shares who doesn t factors associated with openly archiving raw research data
url http://europepmc.org/articles/PMC3135593?pdf=render
work_keys_str_mv AT heatherapiwowar whoshareswhodoesntfactorsassociatedwithopenlyarchivingrawresearchdata