Topic Detection and Multi-word Terms Extraction for Arabic Unvowelized Documents

被引:0
|
作者
Koulali, Rim [1 ]
Meziane, Ahdelouafi [1 ]
机构
[1] Mohammed 1 Univ, Coll Sci, LARI Lab, Hay Al Quods, Oujda, Morocco
来源
INFORMATION RETRIEVAL TECHNOLOGY | 2011年 / 7097卷
关键词
Topic Detection; Topic Oriented Vocabulary; Mutual Information; Jaccard Indicator; TF-IDF; Multi-Word Terms Extraction; C-value; LLR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on Topic Detection (TD) for Arabic Unvowelized documents. Our topic detection system was implemented using two different metrics: adapted TF-IDF and Jaccard indicator. The experiments were conducted while studying the impact of working with stems or roots of words, all the words or nouns only. To enhance the TD system we developed The MWTs extraction prototype to generate MWTs vocabularies. To the best of our knowledge MWTs vocabulary has never been used in arabic documents topic's detection. In this paper we investigate the impact of such use on the quality of topic detection. We used the standard measures: Recall, Precision and F-measure to evaluate the performance of the realized systems on Wattan; an Arabic newspaper corpus.
引用
收藏
页码:614 / 623
页数:10
相关论文
共 50 条
  • [1] Word Embedding Approach for Synonym Extraction of Multi-Word Terms
    Hazem, Amir
    Daille, Beatrice
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 297 - 303
  • [2] A hybrid Approach for Arabic Multi-Word Term Extraction
    Bounhas, Ibrahim
    Slimani, Yahya
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 429 - 436
  • [3] A multi-word term extraction program for Arabic language
    Boulaknadel, Siham
    Daille, Beatrice
    Aboutajdine, Driss
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1485 - 1488
  • [4] A study on multi-word extraction from chinese documents
    School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Tatsunokuchi, Ishikawa
    923-1292, Japan
    不详
    100080, China
    Lect. Notes Comput. Sci., 2008, (42-53):
  • [5] A Study on Multi-word Extraction from Chinese Documents
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, 2008, 4977 : 42 - +
  • [6] Semi-compositional Method for Synonym Extraction of Multi-Word Terms
    Hazem, Amir
    Daille, Beatrice
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1202 - 1207
  • [7] On the Structural Disambiguation of Multi-word Terms
    Cabezas-Garcia, Melania
    Leon-Arauz, Pilar
    COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2019, 2019, 11755 : 46 - 60
  • [8] Should one use term proximity or multi-word terms for Arabic information retrieval?
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 76 - 97
  • [9] Compositionality and lexical alignment of multi-word terms
    Emmanuel Morin
    Béatrice Daille
    Language Resources and Evaluation, 2010, 44 : 79 - 95
  • [10] Multi-word terms selection for information retrieval
    Bechikh Ali, Chedi
    Haddad, Hatem
    Slimani, Yahya
    INFORMATION DISCOVERY AND DELIVERY, 2023, 51 (01) : 74 - 87