An Evaluative Baseline for Sentence-Level Semantic Division

Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), t...

Full description

Bibliographic Details
Main Authors: Kuangsheng Cai, Zugang Chen, Hengliang Guo, Shaohua Wang, Guoqing Li, Jing Li, Feng Chen, Hang Feng
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/6/1/3
_version_ 1797240205830258688
author Kuangsheng Cai
Zugang Chen
Hengliang Guo
Shaohua Wang
Guoqing Li
Jing Li
Feng Chen
Hang Feng
author_facet Kuangsheng Cai
Zugang Chen
Hengliang Guo
Shaohua Wang
Guoqing Li
Jing Li
Feng Chen
Hang Feng
author_sort Kuangsheng Cai
collection DOAJ
description Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.
first_indexed 2024-04-24T18:03:44Z
format Article
id doaj.art-3a51fe7f1055404da9549b835016096d
institution Directory Open Access Journal
issn 2504-4990
language English
last_indexed 2024-04-24T18:03:44Z
publishDate 2024-01-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj.art-3a51fe7f1055404da9549b835016096d2024-03-27T13:52:02ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-01-0161415210.3390/make6010003An Evaluative Baseline for Sentence-Level Semantic DivisionKuangsheng Cai0Zugang Chen1Hengliang Guo2Shaohua Wang3Guoqing Li4Jing Li5Feng Chen6Hang Feng7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaSemantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.https://www.mdpi.com/2504-4990/6/1/3semantic folding theorysemantic division datasetsSSDB-100
spellingShingle Kuangsheng Cai
Zugang Chen
Hengliang Guo
Shaohua Wang
Guoqing Li
Jing Li
Feng Chen
Hang Feng
An Evaluative Baseline for Sentence-Level Semantic Division
Machine Learning and Knowledge Extraction
semantic folding theory
semantic division datasets
SSDB-100
title An Evaluative Baseline for Sentence-Level Semantic Division
title_full An Evaluative Baseline for Sentence-Level Semantic Division
title_fullStr An Evaluative Baseline for Sentence-Level Semantic Division
title_full_unstemmed An Evaluative Baseline for Sentence-Level Semantic Division
title_short An Evaluative Baseline for Sentence-Level Semantic Division
title_sort evaluative baseline for sentence level semantic division
topic semantic folding theory
semantic division datasets
SSDB-100
url https://www.mdpi.com/2504-4990/6/1/3
work_keys_str_mv AT kuangshengcai anevaluativebaselineforsentencelevelsemanticdivision
AT zugangchen anevaluativebaselineforsentencelevelsemanticdivision
AT hengliangguo anevaluativebaselineforsentencelevelsemanticdivision
AT shaohuawang anevaluativebaselineforsentencelevelsemanticdivision
AT guoqingli anevaluativebaselineforsentencelevelsemanticdivision
AT jingli anevaluativebaselineforsentencelevelsemanticdivision
AT fengchen anevaluativebaselineforsentencelevelsemanticdivision
AT hangfeng anevaluativebaselineforsentencelevelsemanticdivision
AT kuangshengcai evaluativebaselineforsentencelevelsemanticdivision
AT zugangchen evaluativebaselineforsentencelevelsemanticdivision
AT hengliangguo evaluativebaselineforsentencelevelsemanticdivision
AT shaohuawang evaluativebaselineforsentencelevelsemanticdivision
AT guoqingli evaluativebaselineforsentencelevelsemanticdivision
AT jingli evaluativebaselineforsentencelevelsemanticdivision
AT fengchen evaluativebaselineforsentencelevelsemanticdivision
AT hangfeng evaluativebaselineforsentencelevelsemanticdivision