An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

被引:1
作者
Alimzhanov, Yermek [1 ]
Mansurova, Madina [1 ]
机构
[1] Al Farabi Kazakh Natl Univ, Al Farabi Ave 71, Alma Ata 050040, Kazakhstan
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT II | 2016年 / 9876卷
关键词
Natural language processing; Latent semantic analysis; Domain knowledge;
D O I
10.1007/978-3-319-45246-3_53
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we consider the approach of automatic extraction of domain keywords from the Kazakh Text based on statistical methods of natural language processing. The proposed approach can be used to build domain dictionaries and thesauri without manual work of domain experts. Results of experiments on a corpus of texts from a Kazakh book and online websites demonstrate that applying latent semantic analysis to keywords extraction significantly decreases information noise and strengthens the words relations.
引用
收藏
页码:555 / 562
页数:8
相关论文
共 14 条
[1]  
[Anonymous], 2015, INT C RECENT ADV NAT
[2]  
Bourigault D., 1999, P EACL
[3]  
Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22
[4]  
Church W.K., 1991, 7 ANN C UW CTR NEW O, P40
[5]  
Collier N., 2001, Terminology, V7, P239, DOI 10.1075/term.7.2.07col
[6]   Glossary extraction and utilization in the information search and delivery system for IBM Technical Support [J].
Kozakov, L ;
Park, Y ;
Fin, T ;
Drissi, Y ;
Doganata, Y ;
Cofino, T .
IBM SYSTEMS JOURNAL, 2004, 43 (03) :546-563
[7]  
Lin D., 1998, 1 WORKSHOP COMPUTATI, P57
[8]  
MANNING C., 2009, An Introduction to 8 Information Retrieval, P181
[9]  
Nugumanova A, 2013, COMM COM INF SC, V394, P92
[10]  
Sundetova A., 2016, AUTOMATIC DETECTION