Multi-word term indexing for Arabic document retrieval

被引:0
|
作者
Boulaknadel, Siham [1 ]
Daille, Beatrice [1 ]
Driss, Aboutajdine [2 ]
机构
[1] Univ Nantes, CNRS, FRE 2729, LINA, 2 Rue Houssinire,BP 92208, F-44322 Nantes 03, France
[2] Mohammed V Univ, GSCM, Rabat, Morocco
来源
2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3 | 2008年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve information retrieval system performances, it seems important to identify key phrases which constitute a better representation of text semantic content than single word terms. In this paper, we adapt the standard method for multi-word term extraction for Arabic language. We define the linguistic specifications and develop a term extraction tool. We experiment the term extraction program for document retrieval in a specific domain, evaluate two kinds of multi-word term weighting functions considering either the corpus or the document, and demonstrate the efficiency of multi-word term indexing for both weighting up to 5.8% of average precision.
引用
收藏
页码:480 / +
页数:3
相关论文
共 50 条
  • [41] A century in the life of multi-word verbs
    Claridge, C
    CORPUS-BASED STUDIES IN ENGLISH, 1997, (20): : 69 - 85
  • [42] The indexing and retrieval of document images: A survey
    Doermann, D
    COMPUTER VISION AND IMAGE UNDERSTANDING, 1998, 70 (03) : 287 - 298
  • [43] Reactive multi-word synchronization for multiprocessors
    Ha, PH
    Tsigas, P
    12TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2003, : 184 - 193
  • [44] On the Structural Disambiguation of Multi-word Terms
    Cabezas-Garcia, Melania
    Leon-Arauz, Pilar
    COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2019, 2019, 11755 : 46 - 60
  • [45] Arabic word descriptor for handwritten word indexing and lexicon reduction
    Chherawala, Youssouf
    Cheriet, Mohamed
    PATTERN RECOGNITION, 2014, 47 (10) : 3477 - 3486
  • [46] Highly concurrent multi-word synchronization
    Attiya, Hagit
    Hillel, Eshcar
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (12-14) : 1243 - 1262
  • [47] A Multi-word Expression Dataset for Swedish
    Kurfali, Murathan
    Ostling, Robert
    Sjons, Johan
    Wiren, Mats
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4402 - 4409
  • [48] Automatic Translation of Multi-word Labels
    Protaziuk, Grzegorz
    Kaczynski, Marcin
    Bembenik, Robert
    MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 99 - 109
  • [49] Combining Indexing Units for Arabic Information Retrieval
    Ben Guirat, Souheila
    Bounhas, Ibrahim
    Slimani, Yahya
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2016, 4 (04) : 1 - 14
  • [50] Indexing Word Sequences for Ranked Retrieval
    Huston, Samuel
    Culpepper, J. Shane
    Croft, W. Bruce
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (01)