Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis

被引:12
作者
Yang, Shuo [1 ]
Wei, Ran [2 ]
Guo, Jingzhi [3 ]
Tan, Hengliang [1 ]
机构
[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA USA
[3] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
来源
JOURNAL OF WEB SEMANTICS | 2020年 / 63卷
基金
中国国家自然科学基金;
关键词
Semantic document classification; Semantic similarity; Semantic embedding; Correlation analysis; Artificial intelligence; ONTOLOGY-DRIVEN;
D O I
10.1016/j.websem.2020.100578
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 57 条
[31]  
Kobayashi S., 2018, P 2018 C N AM CHAPT, P452, DOI DOI 10.18653/V1/N18-2072
[32]   Text Classification Algorithms: A Survey [J].
Kowsari, Kamran ;
Meimandi, Kiana Jafari ;
Heidarysafa, Mojtaba ;
Mendu, Sanjana ;
Barnes, Laura ;
Brown, Donald .
INFORMATION, 2019, 10 (04)
[33]   Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary [J].
Kremen, Petr ;
Necasky, Martin .
JOURNAL OF WEB SEMANTICS, 2019, 55 (1-20) :1-20
[34]  
Leacock C, 1998, LANG SPEECH & COMMUN, P265
[35]  
Lin D., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P296
[36]  
Liu Q, 2002, COMPUTATIONAL LINGUI, V7, P59
[37]  
Liu Y., 2007, INT C COMP SCI, P781
[38]   AquaLog: An ontology-driven question answering system for organizational semantic intranets [J].
Lopez, Vanessa ;
Uren, Victoria ;
Motta, Enrico ;
Pasin, Michele .
JOURNAL OF WEB SEMANTICS, 2007, 5 (02) :72-105
[39]  
Martin James H, 2009, SPEECH LANGUAGE PROC
[40]   Ontology-driven, unsupervised instance population [J].
McDowell, Luke K. ;
Cafarella, Michael .
JOURNAL OF WEB SEMANTICS, 2008, 6 (03) :218-236