Amharic Document Representation for Adhoc Retrieval

被引:3
|
作者
Yeshambel, Tilahun [1 ]
Mothe, Josiane [2 ]
Assabie, Yaregal [3 ]
机构
[1] Addis Ababa Univ, IT PhD Program, Addis Ababa, Ethiopia
[2] Univ Toulouse, UMR5505 CNRS, IRIT, INSPE, Toulouse, France
[3] Addis Ababa Univ, Dept Comp Sci, Addis Ababa, Ethiopia
关键词
Adhoc Retrieval; Amharic; Complex Morphology; Stem; Root; INFORMATION-RETRIEVAL;
D O I
10.5220/0010177301240134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Amharic is the official language of the government of Ethiopia currently having an estimated population of over 110 million. Like other Semitic languages, Amharic is characterized by complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of tools for Amharic natural language processing a non-trivial task. Amharic adhoc retrieval faces difficulties due to the complex morphological structure of the language. In this paper, the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval is investigated. We analyze the effects of stem-based and root-based approaches on Amharic adhoc retrieval effectiveness. Various experiments are conducted on TREC-like Amharic information retrieval test collection using standard evaluation framework and measures. The findings show that a root-based approach outperforms the conventional stem-based approach that prevails in many other languages.
引用
收藏
页码:124 / 134
页数:11
相关论文
共 50 条
  • [1] Amharic Adhoc Information Retrieval System Based on Morphological Features
    Yeshambel, Tilahun
    Mothe, Josiane
    Assabie, Yaregal
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [2] Distant Supervision in BERT-based Adhoc Document Retrieval
    Rudra, Koustav
    Anand, Avishek
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2197 - 2200
  • [3] Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
    Yeshambel, Tilahun
    Mothe, Josiane
    Assabie, Yaregal
    INFORMATION, 2023, 14 (03)
  • [4] Effective Adhoc Retrieval Through Traversal of a Query-Document Graph
    Frayling, Erlend
    MacAvaney, Sean
    Macdonald, Craig
    Ounis, Iadh
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT III, 2024, 14610 : 89 - 104
  • [5] A "stereo" document representation for textual information retrieval
    Chen, L
    Zeng, J
    Tokuda, N
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (06): : 768 - 774
  • [6] EXPERIMENTS WITH REPRESENTATION IN A DOCUMENT-RETRIEVAL SYSTEM
    CROFT, WB
    INFORMATION TECHNOLOGY-RESEARCH DEVELOPMENT APPLICATIONS, 1983, 2 (01): : 1 - 21
  • [7] Semantic Representation and Search Techniques for Document Retrieval Systems
    VanNhon Do
    Huynh, ThanhThuong T.
    TruongAn PhamNguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT I,, 2013, 7802 : 476 - 486
  • [8] Using rich document representation in XML information retrieval
    Raja, Fahimeh
    Keikha, Mostafa
    Rahgozar, Masued
    Oroumchian, Farhad
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 294 - 301
  • [9] DOCUMENT REPRESENTATION IN PROBABILISTIC MODELS OF INFORMATION-RETRIEVAL
    CROFT, WB
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1981, 32 (06): : 451 - 457
  • [10] A Content-based Approach for Document Representation and Retrieval
    Rinaldi, Antonio M.
    DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 106 - 109