Large language models recover scientific collaboration networks from text

Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of...

Cijeli opis

Bibliografski detalji
Glavni autori:	Rathin Jeyaram, Robert N Ward, Marc Santolini
Format:	Članak
Jezik:	English
Izdano:	SpringerOpen 2024-10-01
Serija:	Applied Network Science
Teme:	Social networks Network reconstruction Large language models Collaboration networks Task allocation structures
Online pristup:	https://doi.org/10.1007/s41109-024-00658-8

_version_	1827068695996792832
author	Rathin Jeyaram Robert N Ward Marc Santolini
author_facet	Rathin Jeyaram Robert N Ward Marc Santolini
author_sort	Rathin Jeyaram
collection	DOAJ
description	Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.
first_indexed	2025-03-19T23:54:49Z
format	Article
id	doaj.art-f059d0c67e204a60b4be314a0a6da5cc
institution	Directory Open Access Journal
issn	2364-8228
language	English
last_indexed	2025-03-19T23:54:49Z
publishDate	2024-10-01
publisher	SpringerOpen
record_format	Article
series	Applied Network Science
spelling	doaj.art-f059d0c67e204a60b4be314a0a6da5cc2024-10-13T11:11:19ZengSpringerOpenApplied Network Science2364-82282024-10-019111310.1007/s41109-024-00658-8Large language models recover scientific collaboration networks from textRathin Jeyaram0Robert N Ward1Marc Santolini2Université Paris Cité, Inserm, System Engineering and Evolution DynamicsLearning Planet Institute, Research Unit Learning Transitions (UR LT, joint unit with CY Cergy Paris University)Université Paris Cité, Inserm, System Engineering and Evolution DynamicsAbstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.https://doi.org/10.1007/s41109-024-00658-8Social networksNetwork reconstructionLarge language modelsCollaboration networksTask allocation structures
spellingShingle	Rathin Jeyaram Robert N Ward Marc Santolini Large language models recover scientific collaboration networks from text Applied Network Science Social networks Network reconstruction Large language models Collaboration networks Task allocation structures
title	Large language models recover scientific collaboration networks from text
title_full	Large language models recover scientific collaboration networks from text
title_fullStr	Large language models recover scientific collaboration networks from text
title_full_unstemmed	Large language models recover scientific collaboration networks from text
title_short	Large language models recover scientific collaboration networks from text
title_sort	large language models recover scientific collaboration networks from text
topic	Social networks Network reconstruction Large language models Collaboration networks Task allocation structures
url	https://doi.org/10.1007/s41109-024-00658-8
work_keys_str_mv	AT rathinjeyaram largelanguagemodelsrecoverscientificcollaborationnetworksfromtext AT robertnward largelanguagemodelsrecoverscientificcollaborationnetworksfromtext AT marcsantolini largelanguagemodelsrecoverscientificcollaborationnetworksfromtext

Large language models recover scientific collaboration networks from text

Slični predmeti