Amharic Document Representation for Adhoc Retrieval

被引:3
|
作者
Yeshambel, Tilahun [1 ]
Mothe, Josiane [2 ]
Assabie, Yaregal [3 ]
机构
[1] Addis Ababa Univ, IT PhD Program, Addis Ababa, Ethiopia
[2] Univ Toulouse, UMR5505 CNRS, IRIT, INSPE, Toulouse, France
[3] Addis Ababa Univ, Dept Comp Sci, Addis Ababa, Ethiopia
关键词
Adhoc Retrieval; Amharic; Complex Morphology; Stem; Root; INFORMATION-RETRIEVAL;
D O I
10.5220/0010177301240134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Amharic is the official language of the government of Ethiopia currently having an estimated population of over 110 million. Like other Semitic languages, Amharic is characterized by complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of tools for Amharic natural language processing a non-trivial task. Amharic adhoc retrieval faces difficulties due to the complex morphological structure of the language. In this paper, the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval is investigated. We analyze the effects of stem-based and root-based approaches on Amharic adhoc retrieval effectiveness. Various experiments are conducted on TREC-like Amharic information retrieval test collection using standard evaluation framework and measures. The findings show that a root-based approach outperforms the conventional stem-based approach that prevails in many other languages.
引用
收藏
页码:124 / 134
页数:11
相关论文
共 50 条
  • [41] Handwritten document retrieval
    Russell, G
    Perrone, MP
    Chee, YM
    Ziq, AM
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 233 - 238
  • [42] Distributed Document Representation for Document Classification
    Li, Rumeng
    Shindo, Hiroyuki
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 212 - 225
  • [43] PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval
    Althammer, Sophia
    Hofsfaetter, Sebastian
    Sertkan, Mete
    Verberne, Suzan
    Hanbury, Allan
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 19 - 34
  • [44] Designing a hybrid dimension reduction for improving the performance of Amharic news document classification
    Endalie, Demeke
    Tegegne, Tesfa
    PLOS ONE, 2021, 16 (05):
  • [45] Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm
    Hailu, Abraham
    Assabie, Yaregal
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2016, 9561 : 317 - 326
  • [46] A Semantic Document Retrieval System with Semantic Search Technique Based on Knowledge Base and Graph Representation
    Huynh, ThanhThuong T.
    Do, Nhon V.
    Pham, TruongAn N.
    Tran, NgocHan T.
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_18), 2018, 303 : 870 - 882
  • [47] Fine-Tuning BERT Models for Multiclass Amharic News Document Categorization
    Endalie, Demeke
    COMPLEXITY, 2025, 2025 (01)
  • [48] DOCUMENT DESCRIPTION AND REPRESENTATION
    RICHMOND, PA
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1972, 7 : 73 - 102
  • [49] DOCUMENT DESCRIPTION AND REPRESENTATION
    VICKERY, BC
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1971, 6 : 113 - 140
  • [50] The challenge of commercial document retrieval, Part 1: Major issues, and a framework based on search exhaustivity, determinacy of representation and document collection size
    Blair, DC
    INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (02) : 273 - 291