Large language models recover scientific collaboration networks from text

Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of...

Cijeli opis

Bibliografski detalji
Glavni autori: Rathin Jeyaram, Robert N Ward, Marc Santolini
Format: Članak
Jezik:English
Izdano: SpringerOpen 2024-10-01
Serija:Applied Network Science
Teme:
Online pristup:https://doi.org/10.1007/s41109-024-00658-8
_version_ 1827068695996792832
author Rathin Jeyaram
Robert N Ward
Marc Santolini
author_facet Rathin Jeyaram
Robert N Ward
Marc Santolini
author_sort Rathin Jeyaram
collection DOAJ
description Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.
first_indexed 2025-03-19T23:54:49Z
format Article
id doaj.art-f059d0c67e204a60b4be314a0a6da5cc
institution Directory Open Access Journal
issn 2364-8228
language English
last_indexed 2025-03-19T23:54:49Z
publishDate 2024-10-01
publisher SpringerOpen
record_format Article
series Applied Network Science
spelling doaj.art-f059d0c67e204a60b4be314a0a6da5cc2024-10-13T11:11:19ZengSpringerOpenApplied Network Science2364-82282024-10-019111310.1007/s41109-024-00658-8Large language models recover scientific collaboration networks from textRathin Jeyaram0Robert N Ward1Marc Santolini2Université Paris Cité, Inserm, System Engineering and Evolution DynamicsLearning Planet Institute, Research Unit Learning Transitions (UR LT, joint unit with CY Cergy Paris University)Université Paris Cité, Inserm, System Engineering and Evolution DynamicsAbstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.https://doi.org/10.1007/s41109-024-00658-8Social networksNetwork reconstructionLarge language modelsCollaboration networksTask allocation structures
spellingShingle Rathin Jeyaram
Robert N Ward
Marc Santolini
Large language models recover scientific collaboration networks from text
Applied Network Science
Social networks
Network reconstruction
Large language models
Collaboration networks
Task allocation structures
title Large language models recover scientific collaboration networks from text
title_full Large language models recover scientific collaboration networks from text
title_fullStr Large language models recover scientific collaboration networks from text
title_full_unstemmed Large language models recover scientific collaboration networks from text
title_short Large language models recover scientific collaboration networks from text
title_sort large language models recover scientific collaboration networks from text
topic Social networks
Network reconstruction
Large language models
Collaboration networks
Task allocation structures
url https://doi.org/10.1007/s41109-024-00658-8
work_keys_str_mv AT rathinjeyaram largelanguagemodelsrecoverscientificcollaborationnetworksfromtext
AT robertnward largelanguagemodelsrecoverscientificcollaborationnetworksfromtext
AT marcsantolini largelanguagemodelsrecoverscientificcollaborationnetworksfromtext