Impact of term-indexing for arabic document retrieval

被引：0

作者：

LINA FRE CNRS 2729, Université de Nantes, 2 rue la Houssinière, 44322 Nantes Cedex 03, France ^{[1
]}

不详 ^{[2
]}

机构：

[1] LINA FRE CNRS 2729, Université de Nantes, 44322 Nantes Cedex 03

[2] GSCM Université Mohammed V, Agdal Rabat

来源：

Lect. Notes Comput. Sci. | 2008年 / 380-383期

关键词：

Indexing (of information);

D O I：

10.1007/978-3-540-69858-6_49

中图分类号：

学科分类号：

摘要：

In this paper, we adapt the standard method for multi-word term extraction for Arabic language. We define the linguistic specifications and develop a term extraction tool. We experiment the term extraction program for document retrieval in a specific domain, evaluate two kinds of multi-word term weighting functions considering either the corpus or the document, and demonstrate the efficiency of multi-word term indexing for both weighting up to 5.8% of average precision. © 2008 Springer-Verlag Berlin Heidelberg.

引用

页码：380 / 383

页数：3

共 6 条

[1] Diab M., Hacioglu K., Jurafsky D., Automatic tagging of arabic text: From raw text to base phrase chunks, Proceedings of HLT-NAACL, pp. 149-152, (2004)
[2] Dunning T., Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 1, pp. 61-74, (1994)
[3] Kenneth W.C., Hanks P., Word association norms, mutual information, and lexicography, Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pp. 76-83, (1989)
[4] Church K., Gale W., Hanks P., Hindle D., Using statistics in lexical analysis, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115-164, (1991)
[5] Kageura K., Umino B., Methods of automatic term recognition: A review, Terminology, 3, 2, pp. 259-289, (1996)
[6] Darwish K., Probabilistic Methods for Searching OCR-Degraded Arabic Text, (2003)

← 1 →