Identify novel elements of knowledge with word embedding

As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combinat...

Full description

Bibliographic Details
Main Authors: Deyun Yin, Zhao Wu, Kazuki Yokota, Kuniko Matsumoto, Sotaro Shibayama
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281565/?tool=EBI
_version_ 1797797883994439680
author Deyun Yin
Zhao Wu
Kazuki Yokota
Kuniko Matsumoto
Sotaro Shibayama
author_facet Deyun Yin
Zhao Wu
Kazuki Yokota
Kuniko Matsumoto
Sotaro Shibayama
author_sort Deyun Yin
collection DOAJ
description As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.
first_indexed 2024-03-13T03:55:06Z
format Article
id doaj.art-a9b4bad536da4594be7ed6b85e040776
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-03-13T03:55:06Z
publishDate 2023-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-a9b4bad536da4594be7ed6b85e0407762023-06-22T05:31:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01186Identify novel elements of knowledge with word embeddingDeyun YinZhao WuKazuki YokotaKuniko MatsumotoSotaro ShibayamaAs novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281565/?tool=EBI
spellingShingle Deyun Yin
Zhao Wu
Kazuki Yokota
Kuniko Matsumoto
Sotaro Shibayama
Identify novel elements of knowledge with word embedding
PLoS ONE
title Identify novel elements of knowledge with word embedding
title_full Identify novel elements of knowledge with word embedding
title_fullStr Identify novel elements of knowledge with word embedding
title_full_unstemmed Identify novel elements of knowledge with word embedding
title_short Identify novel elements of knowledge with word embedding
title_sort identify novel elements of knowledge with word embedding
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281565/?tool=EBI
work_keys_str_mv AT deyunyin identifynovelelementsofknowledgewithwordembedding
AT zhaowu identifynovelelementsofknowledgewithwordembedding
AT kazukiyokota identifynovelelementsofknowledgewithwordembedding
AT kunikomatsumoto identifynovelelementsofknowledgewithwordembedding
AT sotaroshibayama identifynovelelementsofknowledgewithwordembedding