Generic SAO Similarity Measure via Extended Sørensen-Dice Index

As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action-Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-b...

Full description

Bibliographic Details
Main Authors: Xiaoman Li, Cui Wang, Xuefu Zhang, Wei Sun
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9050516/
_version_ 1819173433445974016
author Xiaoman Li
Cui Wang
Xuefu Zhang
Wei Sun
author_facet Xiaoman Li
Cui Wang
Xuefu Zhang
Wei Sun
author_sort Xiaoman Li
collection DOAJ
description As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action-Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.
first_indexed 2024-12-22T20:23:00Z
format Article
id doaj.art-2b223cb65aaf4149b94ad27eda05aad6
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T20:23:00Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2b223cb65aaf4149b94ad27eda05aad62022-12-21T18:13:47ZengIEEEIEEE Access2169-35362020-01-018665386655210.1109/ACCESS.2020.29840249050516Generic SAO Similarity Measure via Extended Sørensen-Dice IndexXiaoman Li0https://orcid.org/0000-0001-5718-0047Cui Wang1Xuefu Zhang2Wei Sun3Chinese Academy of Agricultural Sciences, Agricultural Information Institute, Beijing, ChinaChinese Academy of Agricultural Sciences, Agricultural Information Institute, Beijing, ChinaChinese Academy of Agricultural Sciences, Agricultural Information Institute, Beijing, ChinaChinese Academy of Agricultural Sciences, Agricultural Information Institute, Beijing, ChinaAs an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action-Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.https://ieeexplore.ieee.org/document/9050516/Similarity measurementSørensen-Dice indexsemantic informationSubject-Action-Objectcomputational linguistics
spellingShingle Xiaoman Li
Cui Wang
Xuefu Zhang
Wei Sun
Generic SAO Similarity Measure via Extended Sørensen-Dice Index
IEEE Access
Similarity measurement
Sørensen-Dice index
semantic information
Subject-Action-Object
computational linguistics
title Generic SAO Similarity Measure via Extended Sørensen-Dice Index
title_full Generic SAO Similarity Measure via Extended Sørensen-Dice Index
title_fullStr Generic SAO Similarity Measure via Extended Sørensen-Dice Index
title_full_unstemmed Generic SAO Similarity Measure via Extended Sørensen-Dice Index
title_short Generic SAO Similarity Measure via Extended Sørensen-Dice Index
title_sort generic sao similarity measure via extended s x00f8 rensen dice index
topic Similarity measurement
Sørensen-Dice index
semantic information
Subject-Action-Object
computational linguistics
url https://ieeexplore.ieee.org/document/9050516/
work_keys_str_mv AT xiaomanli genericsaosimilaritymeasureviaextendedsx00f8rensendiceindex
AT cuiwang genericsaosimilaritymeasureviaextendedsx00f8rensendiceindex
AT xuefuzhang genericsaosimilaritymeasureviaextendedsx00f8rensendiceindex
AT weisun genericsaosimilaritymeasureviaextendedsx00f8rensendiceindex