Collection-based compound noun segmentation for Korean information retrieval

被引:0
作者
Kang, In-Su [1 ]
Na, Seung-Hoon [1 ]
Lee, Jong-Hyeok [1 ]
机构
[1] Pohang Univ Sci & Technol, Div Elect & Comp Engn, AITrc, Pohang 790784, South Korea
来源
INFORMATION RETRIEVAL | 2006年 / 9卷 / 05期
关键词
compound noun segmentation; unsupervised method; Korean information retrieval;
D O I
10.1007/s10791-006-9007-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm.
引用
收藏
页码:613 / 631
页数:19
相关论文
共 30 条
[1]  
Ando R. K., 2003, Natural Language Engineering, V9, P127, DOI 10.1017/S1351324902002954
[2]  
[Anonymous], P 24 ANN INT ACM SIG, DOI DOI 10.1145/383952.384019
[3]  
CHEN KH, 2002, 3 NTCIR WORKSH M, P1
[4]   Dynamic behavior of steel frames with beam flanges shaved around connection [J].
Chen, SJ ;
Chu, JM ;
Chou, ZL .
JOURNAL OF CONSTRUCTIONAL STEEL RESEARCH, 1997, 42 (01) :49-70
[5]  
CHOI JH, 1996, P 8 ANN C HANG KOR L, P262
[6]  
Ge XP, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P271
[7]  
HUANG X, 2002, 3 NTCIR WORKSH M, P159
[8]  
JANG DH, 1996, P 8 ANN C HANG KOR L, P32
[9]  
KANG SS, 1993, P SPRING C KOR COGN, P175
[10]  
KANG SS, 1998, J KOREAN INFORM SCI, V25, P172