From plain character strings to meaningful words: Producing better full text databases for inflectional and compounding languages with morphological analysis software

被引:21
作者
Alkula, R
机构
[1] Tieto Enator Corporation,
来源
INFORMATION RETRIEVAL | 2001年 / 4卷 / 3-4期
关键词
natural language processing; full text retrieval; stemming; morphology;
D O I
10.1023/A:1011942104443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper deals with linguistic processing and retrieval techniques in fulltext databases. Special attention is focused on the characteristics of highly inflectional languages, and how morphological structure of a language should be taken into account, when designing and developing information retrieval systems. Finnish is used as an example of a language, which has a more complicated inflectional structure than the English language. In the FULLTEXT project, natural language analysis modules for Finnish were incorporated into the commercial BASIS information retrieval system, which is based on inverted files and Boolean searching. Several test databases were produced, each using one or two Finnish morphological analysis programs.
引用
收藏
页码:195 / 208
页数:14
相关论文
共 23 条
[1]  
Abu-Salem H, 1999, J AM SOC INFORM SCI, V50, P524, DOI 10.1002/(SICI)1097-4571(1999)50:6<524::AID-ASI7>3.0.CO
[2]  
2-M
[3]  
ALKULA R, 2000, ACTA ELECT U TAMPERE, V51
[4]  
[Anonymous], FINNISH GRAMMAR
[5]  
[Anonymous], P 16 ANN INT ACM SIG
[6]  
[Anonymous], YLEINEN KIELITIEDE
[7]  
Conover W. J., 1980, PRACTICAL NONPARAMET
[8]  
HARMAN D, 1991, J AM SOC INFORM SCI, V42, P7, DOI 10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO
[9]  
2-P
[10]  
Hull DA, 1996, J AM SOC INFORM SCI, V47, P70, DOI 10.1002/(SICI)1097-4571(199601)47:1<70::AID-ASI7>3.0.CO