The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research...

Full description

Bibliographic Details
Main Authors: David Schindler, Felix Bensmann, Stefan Dietze, Frank Krüger
Format: Article
Language:English
Published: PeerJ Inc. 2022-01-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-835.pdf
_version_ 1818753074493128704
author David Schindler
Felix Bensmann
Stefan Dietze
Frank Krüger
author_facet David Schindler
Felix Bensmann
Stefan Dietze
Frank Krüger
author_sort David Schindler
collection DOAJ
description Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.
first_indexed 2024-12-18T05:01:35Z
format Article
id doaj.art-ed7377d6da684583b6cc48f84ef93bc9
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-12-18T05:01:35Z
publishDate 2022-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-ed7377d6da684583b6cc48f84ef93bc92022-12-21T21:20:07ZengPeerJ Inc.PeerJ Computer Science2376-59922022-01-018e83510.7717/peerj-cs.835The role of software in science: a knowledge graph-based analysis of software mentions in PubMed CentralDavid Schindler0Felix Bensmann1Stefan Dietze2Frank Krüger3Institute of Communications Engineering, University of Rostock, Rostock, GermanyGESIS - Leibniz Institute for the Social Sciences, Cologne, GermanyGESIS - Leibniz Institute for the Social Sciences, Cologne, GermanyInstitute of Communications Engineering, University of Rostock, Rostock, GermanyScience across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.https://peerj.com/articles/cs-835.pdfKnowledge graphSoftware mentionNamed entity recognitionSoftware citation
spellingShingle David Schindler
Felix Bensmann
Stefan Dietze
Frank Krüger
The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
PeerJ Computer Science
Knowledge graph
Software mention
Named entity recognition
Software citation
title The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
title_full The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
title_fullStr The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
title_full_unstemmed The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
title_short The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central
title_sort role of software in science a knowledge graph based analysis of software mentions in pubmed central
topic Knowledge graph
Software mention
Named entity recognition
Software citation
url https://peerj.com/articles/cs-835.pdf
work_keys_str_mv AT davidschindler theroleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT felixbensmann theroleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT stefandietze theroleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT frankkruger theroleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT davidschindler roleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT felixbensmann roleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT stefandietze roleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral
AT frankkruger roleofsoftwareinscienceaknowledgegraphbasedanalysisofsoftwarementionsinpubmedcentral