Impact of term-indexing for arabic document retrieval

被引:0
作者
LINA FRE CNRS 2729, Université de Nantes, 2 rue la Houssinière, 44322 Nantes Cedex 03, France [1 ]
不详 [2 ]
机构
[1] LINA FRE CNRS 2729, Université de Nantes, 44322 Nantes Cedex 03
[2] GSCM Université Mohammed V, Agdal Rabat
来源
Lect. Notes Comput. Sci. | 2008年 / 380-383期
关键词
Indexing (of information);
D O I
10.1007/978-3-540-69858-6_49
中图分类号
学科分类号
摘要
In this paper, we adapt the standard method for multi-word term extraction for Arabic language. We define the linguistic specifications and develop a term extraction tool. We experiment the term extraction program for document retrieval in a specific domain, evaluate two kinds of multi-word term weighting functions considering either the corpus or the document, and demonstrate the efficiency of multi-word term indexing for both weighting up to 5.8% of average precision. © 2008 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:380 / 383
页数:3
相关论文
共 6 条
  • [1] Diab M., Hacioglu K., Jurafsky D., Automatic tagging of arabic text: From raw text to base phrase chunks, Proceedings of HLT-NAACL, pp. 149-152, (2004)
  • [2] Dunning T., Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 1, pp. 61-74, (1994)
  • [3] Kenneth W.C., Hanks P., Word association norms, mutual information, and lexicography, Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pp. 76-83, (1989)
  • [4] Church K., Gale W., Hanks P., Hindle D., Using statistics in lexical analysis, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115-164, (1991)
  • [5] Kageura K., Umino B., Methods of automatic term recognition: A review, Terminology, 3, 2, pp. 259-289, (1996)
  • [6] Darwish K., Probabilistic Methods for Searching OCR-Degraded Arabic Text, (2003)