An Evaluative Baseline for Sentence-Level Semantic Division
Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), t...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-01-01
|
Series: | Machine Learning and Knowledge Extraction |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-4990/6/1/3 |
_version_ | 1797240205830258688 |
---|---|
author | Kuangsheng Cai Zugang Chen Hengliang Guo Shaohua Wang Guoqing Li Jing Li Feng Chen Hang Feng |
author_facet | Kuangsheng Cai Zugang Chen Hengliang Guo Shaohua Wang Guoqing Li Jing Li Feng Chen Hang Feng |
author_sort | Kuangsheng Cai |
collection | DOAJ |
description | Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods. |
first_indexed | 2024-04-24T18:03:44Z |
format | Article |
id | doaj.art-3a51fe7f1055404da9549b835016096d |
institution | Directory Open Access Journal |
issn | 2504-4990 |
language | English |
last_indexed | 2024-04-24T18:03:44Z |
publishDate | 2024-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Machine Learning and Knowledge Extraction |
spelling | doaj.art-3a51fe7f1055404da9549b835016096d2024-03-27T13:52:02ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-01-0161415210.3390/make6010003An Evaluative Baseline for Sentence-Level Semantic DivisionKuangsheng Cai0Zugang Chen1Hengliang Guo2Shaohua Wang3Guoqing Li4Jing Li5Feng Chen6Hang Feng7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaSchool of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaSemantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.https://www.mdpi.com/2504-4990/6/1/3semantic folding theorysemantic division datasetsSSDB-100 |
spellingShingle | Kuangsheng Cai Zugang Chen Hengliang Guo Shaohua Wang Guoqing Li Jing Li Feng Chen Hang Feng An Evaluative Baseline for Sentence-Level Semantic Division Machine Learning and Knowledge Extraction semantic folding theory semantic division datasets SSDB-100 |
title | An Evaluative Baseline for Sentence-Level Semantic Division |
title_full | An Evaluative Baseline for Sentence-Level Semantic Division |
title_fullStr | An Evaluative Baseline for Sentence-Level Semantic Division |
title_full_unstemmed | An Evaluative Baseline for Sentence-Level Semantic Division |
title_short | An Evaluative Baseline for Sentence-Level Semantic Division |
title_sort | evaluative baseline for sentence level semantic division |
topic | semantic folding theory semantic division datasets SSDB-100 |
url | https://www.mdpi.com/2504-4990/6/1/3 |
work_keys_str_mv | AT kuangshengcai anevaluativebaselineforsentencelevelsemanticdivision AT zugangchen anevaluativebaselineforsentencelevelsemanticdivision AT hengliangguo anevaluativebaselineforsentencelevelsemanticdivision AT shaohuawang anevaluativebaselineforsentencelevelsemanticdivision AT guoqingli anevaluativebaselineforsentencelevelsemanticdivision AT jingli anevaluativebaselineforsentencelevelsemanticdivision AT fengchen anevaluativebaselineforsentencelevelsemanticdivision AT hangfeng anevaluativebaselineforsentencelevelsemanticdivision AT kuangshengcai evaluativebaselineforsentencelevelsemanticdivision AT zugangchen evaluativebaselineforsentencelevelsemanticdivision AT hengliangguo evaluativebaselineforsentencelevelsemanticdivision AT shaohuawang evaluativebaselineforsentencelevelsemanticdivision AT guoqingli evaluativebaselineforsentencelevelsemanticdivision AT jingli evaluativebaselineforsentencelevelsemanticdivision AT fengchen evaluativebaselineforsentencelevelsemanticdivision AT hangfeng evaluativebaselineforsentencelevelsemanticdivision |