An Evaluative Baseline for Sentence-Level Semantic Division

被引:0
作者
Cai, Kuangsheng [1 ,2 ]
Chen, Zugang [1 ,2 ]
Guo, Hengliang [2 ]
Wang, Shaohua [1 ]
Li, Guoqing [1 ]
Li, Jing [1 ]
Chen, Feng [2 ]
Feng, Hang [2 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Zhengzhou Univ, Sch Comp & Artificial Intelligence, Zhengzhou 450001, Peoples R China
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2024年 / 6卷 / 01期
基金
中国国家自然科学基金;
关键词
semantic folding theory; semantic division datasets; SSDB-100; LANGUAGE;
D O I
10.3390/make6010003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 28 条
[1]  
Ahmad S, 2016, Arxiv, DOI arXiv:1601.00720
[2]  
Ahmad S, 2015, Arxiv, DOI arXiv:1503.07469
[3]  
ai.stanford.edu, Large Movie Review Dataset
[4]  
[Anonymous], 2013, U.S.
[5]   Language representation in the human brain: Evidence from cortical mapping [J].
Bhatnagar, SC ;
Mandybur, GT ;
Buckingham, HW ;
Andy, OJ .
BRAIN AND LANGUAGE, 2000, 74 (02) :238-259
[6]   The HTM Spatial Pooler - A Neocortical Algorithm for Online Sparse Distributed Coding [J].
Cui, Yuwei ;
Ahmad, Subutai ;
Hawkins, Jeff .
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2017, 11
[7]  
dumps, Wikipedia Dataset
[8]  
english-corpora, English-Corpora
[9]   Unity and diversity in human language [J].
Fitch, W. Tecumseh .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2011, 366 (1563) :376-388
[10]  
Geiss J., 2009, P ACL IJCNLP 2009 ST, P96