Large language models recover scientific collaboration networks from text
Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of...
Glavni autori: | , , |
---|---|
Format: | Članak |
Jezik: | English |
Izdano: |
SpringerOpen
2024-10-01
|
Serija: | Applied Network Science |
Teme: | |
Online pristup: | https://doi.org/10.1007/s41109-024-00658-8 |
_version_ | 1827068695996792832 |
---|---|
author | Rathin Jeyaram Robert N Ward Marc Santolini |
author_facet | Rathin Jeyaram Robert N Ward Marc Santolini |
author_sort | Rathin Jeyaram |
collection | DOAJ |
description | Abstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision. |
first_indexed | 2025-03-19T23:54:49Z |
format | Article |
id | doaj.art-f059d0c67e204a60b4be314a0a6da5cc |
institution | Directory Open Access Journal |
issn | 2364-8228 |
language | English |
last_indexed | 2025-03-19T23:54:49Z |
publishDate | 2024-10-01 |
publisher | SpringerOpen |
record_format | Article |
series | Applied Network Science |
spelling | doaj.art-f059d0c67e204a60b4be314a0a6da5cc2024-10-13T11:11:19ZengSpringerOpenApplied Network Science2364-82282024-10-019111310.1007/s41109-024-00658-8Large language models recover scientific collaboration networks from textRathin Jeyaram0Robert N Ward1Marc Santolini2Université Paris Cité, Inserm, System Engineering and Evolution DynamicsLearning Planet Institute, Research Unit Learning Transitions (UR LT, joint unit with CY Cergy Paris University)Université Paris Cité, Inserm, System Engineering and Evolution DynamicsAbstract Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.https://doi.org/10.1007/s41109-024-00658-8Social networksNetwork reconstructionLarge language modelsCollaboration networksTask allocation structures |
spellingShingle | Rathin Jeyaram Robert N Ward Marc Santolini Large language models recover scientific collaboration networks from text Applied Network Science Social networks Network reconstruction Large language models Collaboration networks Task allocation structures |
title | Large language models recover scientific collaboration networks from text |
title_full | Large language models recover scientific collaboration networks from text |
title_fullStr | Large language models recover scientific collaboration networks from text |
title_full_unstemmed | Large language models recover scientific collaboration networks from text |
title_short | Large language models recover scientific collaboration networks from text |
title_sort | large language models recover scientific collaboration networks from text |
topic | Social networks Network reconstruction Large language models Collaboration networks Task allocation structures |
url | https://doi.org/10.1007/s41109-024-00658-8 |
work_keys_str_mv | AT rathinjeyaram largelanguagemodelsrecoverscientificcollaborationnetworksfromtext AT robertnward largelanguagemodelsrecoverscientificcollaborationnetworksfromtext AT marcsantolini largelanguagemodelsrecoverscientificcollaborationnetworksfromtext |