A hybrid Approach for Arabic Multi-Word Term Extraction

被引:0
|
作者
Bounhas, Ibrahim [1 ]
Slimani, Yahya [1 ]
机构
[1] Univ Tunis, Fac Sci Tunis, Dept Comp Sci, Tunis 1060, Tunisia
来源
IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING | 2009年
关键词
Arabic language processing; morpho-syntactic parsing; multi-word terms; terminology extraction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building a domain model from a specialized corpus requires identifying candidate terms. It also includes identifying semantic relations between terms. Once this model is constructed it can be used for many tasks of information retrieval. In this process, multi-word terms have a great importance. In the one hand they constitute domain relevant candidate terms. On the other hand syntactic relations that link their constituents can be used to infer semantic relations between terms. In this paper we propose to extract mutli-word terms from Arabic specialized corpora. The proposed approach uses linguistic rules based on morphological features and POS (Part Of Speech) tags to parse documents and retrieve candidate terms. Statistical measures are used to deal with ambiguities generated by the linguistic tools and to rank candidate terms according to their relevance. We present experiments on a corpus from the environment domain. We report high quality results that are confirm the targets set for the precision metric.
引用
收藏
页码:429 / 436
页数:8
相关论文
共 50 条
  • [41] Highly concurrent multi-word synchronization
    Attiya, Hagit
    Hillel, Eshcar
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (12-14) : 1243 - 1262
  • [42] A Multi-word Expression Dataset for Swedish
    Kurfali, Murathan
    Ostling, Robert
    Sjons, Johan
    Wiren, Mats
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4402 - 4409
  • [43] Automatic Translation of Multi-word Labels
    Protaziuk, Grzegorz
    Kaczynski, Marcin
    Bembenik, Robert
    MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 99 - 109
  • [44] MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology
    Thanawala P.
    Pareek J.
    International Journal of Information Technology, 2018, 10 (3) : 303 - 311
  • [45] Compositionality and lexical alignment of multi-word terms
    Emmanuel Morin
    Béatrice Daille
    Language Resources and Evaluation, 2010, 44 : 79 - 95
  • [46] Between-word junctures in early multi-word speech
    Newton, C
    Wells, B
    JOURNAL OF CHILD LANGUAGE, 2002, 29 (02) : 275 - 299
  • [47] Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach
    Drymonas, Euthymios
    Zervanou, Kalliopi
    Petrakis, Euripides G. M.
    Journal of Digital Information Management, 2010, 8 (05): : 315 - 321
  • [48] A Distributional Multi-word Thesaurus in Sketch Engine
    Jakubicek, Milos
    Rychly, Pavel
    RASLAN 2019: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2019, : 143 - 147
  • [49] Syntactic concordancing and multi-word expression detection
    Seretan, V.
    Wehrli, E.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2013, 5 (02) : 158 - 181
  • [50] Text classification using multi-word features
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3740 - +