An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

被引：1

作者：

Alimzhanov, Yermek ^{[1
]}

Mansurova, Madina ^{[1
]}

机构：

[1] Al Farabi Kazakh Natl Univ, Al Farabi Ave 71, Alma Ata 050040, Kazakhstan

来源：

COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT II | 2016年 / 9876卷

关键词：

Natural language processing; Latent semantic analysis; Domain knowledge;

D O I：

10.1007/978-3-319-45246-3_53

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we consider the approach of automatic extraction of domain keywords from the Kazakh Text based on statistical methods of natural language processing. The proposed approach can be used to build domain dictionaries and thesauri without manual work of domain experts. Results of experiments on a corpus of texts from a Kazakh book and online websites demonstrate that applying latent semantic analysis to keywords extraction significantly decreases information noise and strengthens the words relations.

引用

页码：555 / 562

页数：8

共 14 条

[1]

[Anonymous], 2015, INT C RECENT ADV NAT

[2]

Bourigault D., 1999, P EACL

[3]

Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22

[4]

Church W.K., 1991, 7 ANN C UW CTR NEW O, P40

[5]

Collier N., 2001, Terminology, V7, P239, DOI 10.1075/term.7.2.07col

[6] Glossary extraction and utilization in the information search and delivery system for IBM Technical Support [J].

Kozakov, L ;

Park, Y ;

Fin, T ;

Drissi, Y ;

Doganata, Y ;

Cofino, T .

IBM SYSTEMS JOURNAL, 2004, 43 (03) :546-563

[7]

Lin D., 1998, 1 WORKSHOP COMPUTATI, P57

[8]

MANNING C., 2009, An Introduction to 8 Information Retrieval, P181

[9]

Nugumanova A, 2013, COMM COM INF SC, V394, P92

[10]

Sundetova A., 2016, AUTOMATIC DETECTION

← 1 2 →