Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis

被引:13
作者
Yang, Shuo [1 ]
Wei, Ran [2 ]
Guo, Jingzhi [3 ]
Tan, Hengliang [1 ]
机构
[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA USA
[3] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
来源
JOURNAL OF WEB SEMANTICS | 2020年 / 63卷
基金
中国国家自然科学基金;
关键词
Semantic document classification; Semantic similarity; Semantic embedding; Correlation analysis; Artificial intelligence; ONTOLOGY-DRIVEN;
D O I
10.1016/j.websem.2020.100578
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 57 条
[1]  
Aggarwal C. C., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
[2]   Semantic text classification: A survey of past and recent advances [J].
Altinel, Berna ;
Ganiz, Murat Can .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) :1129-1153
[3]  
Andelic S., 2017, TEXT CLASSIFICATION, P23
[4]  
[Anonymous], 2010, P 2010 C EMPIRICAL M
[5]  
[Anonymous], 2016, FastText.zip: Compressing text classification models
[6]  
[Anonymous], 2006, Semi-Supervised Learning, DOI DOI 10.7551/MITPRESS/9780262033589.003.0003
[7]  
[Anonymous], 2005, P INT WORKSH ART INT
[8]  
[Anonymous], 2017, ARXIV170500440
[9]  
[Anonymous], 2015, ARXIV150306483
[10]  
[Anonymous], 2014, C EMPIRICAL METHODS