Enhancing information retrieval through statistical natural language processing: A study of collocation indexing

被引:0
作者
Arazy, Ofer
Woo, Carson
机构
[1] Univ Alberta, Edmonton, AB T6G 2R6, Canada
[2] Univ British Columbia, Sauder Sch Business, Vancouver, BC V6T 1Z2, Canada
关键词
document management; information retrieval (IR); word ambiguity; natural language processing (NLP); collocations; distance; directionality; weighting; general settings;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we provide preliminary evidence for the usefulness of statistical natural language processing (NLP) techniques, and specifically of collocation indexing, for IR in general settings. We investigate the effect of three key parameters on collocation indexing performance: directionality, distance, and weighting. We build on previous work in IR to (1) advance our knowledge of key design elements for collocation indexing, (2) demonstrate gains in retrieval precision from the use of statistical NLP for general-settings IR, and, finally, (3) provide practitioners with a useful cost benefit analysis of the methods under investigation. Although the management of information assets-specifically, of text documents that make up 80 percent of these assets an provide organizations with a competitive advantage, the ability of information retrieval (IR) systems to deliver relevant information to users is severely hampered by the difficulty of disambiguating natural language. The word ambiguity problem is addressed with moderate success in restricted settings, but continues to be the main challenge for general settings, characterized by large, heterogeneous document collections.
引用
收藏
页码:525 / 546
页数:22
相关论文
共 42 条
[1]  
[Anonymous], 2003, Foundations of Statistical Natural Language Processing
[2]  
[Anonymous], 2004, P 7 INT C COMP ASS I
[3]  
[Anonymous], 2001, KNOWLEDGE MANAGEMENT
[4]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[5]  
BUCKLEY C, 1996, P 4 TEXT RETR C
[6]  
CARMEL D, 2001, P 10 TEXT RETR C
[7]  
CROGT WB, 1991, P 14 ANN C RES DEV I, P32
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]  
FAGAN JL, 1989, J AM SOC INFORM SCI, V40, P115, DOI 10.1002/(SICI)1097-4571(198903)40:2<115::AID-ASI6>3.0.CO