Indexing Arabic texts using association rule data mining

被引:7
|
作者
Haraty, Ramzi A. [1 ]
Nasrallah, Rouba [2 ]
机构
[1] Lebanese Amer Univ, Dept Comp Sci & Math, Beirut, Lebanon
[2] Lebanese Amer Univ, Beirut, Lebanon
关键词
Precision; Recall; Arabic text; Auto-indexing; Frequent sets; Rule-based data mining; FREQUENCY;
D O I
10.1108/LHT-07-2017-0147
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules. Design/methodology/approach The proposed model uses an association rule algorithm for extracting frequent sets containing related items - to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together. Findings The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works. Research limitations/implications -The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results. Originality/value -In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.
引用
收藏
页码:101 / 117
页数:17
相关论文
共 50 条
  • [21] Association rule selection in a data mining environment
    Klemettinen, M
    Mannila, H
    Verkamo, AI
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 372 - 377
  • [22] Parallel implementation of association rule in Data Mining
    Einakian, Sussan
    Ghanbari, M.
    Proceedings of the Thirty-Eighth Southeastern Symposium on System Theory, 2004, : 21 - 26
  • [23] Data squashing as preprocessing in association rule mining
    Fister, Iztok
    Fister, Iztok, Jr.
    Novak, Damijan
    Verber, Domen
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1720 - 1725
  • [24] Strategies for partitioning data in association rule mining
    Ahmed, S
    Coenen, R
    Leng, P
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XX, 2004, : 127 - 139
  • [25] Association Rule Data Mining in Agriculture - A Review
    Vignesh, N.
    Vinutha, D. C.
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 233 - 239
  • [26] An efficient association rule mining for XML data
    Khaing, Myint Myint
    Thein, Nilar
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2843 - +
  • [27] Preserving Data Confidentiality in Association Rule Mining Using Data Share Allocator Algorithm
    Dhinakaran, D.
    Prathap, P. M. Joe
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 33 (03): : 1877 - 1892
  • [28] Disease prediction in data mining using association rule mining and keyword based clustering algorithms
    Ramasamy S.
    Nirmala K.
    International Journal of Computers and Applications, 2020, 42 (01) : 1 - 8
  • [29] Chiller Optimization Using Data Mining Based on Prediction Model, Clustering and Association Rule Mining
    Nisa, Elsa Chaerun
    Kuan, Yean-Der
    Lai, Chin-Chang
    ENERGIES, 2021, 14 (20)
  • [30] Association rule filter for data mining in call tracking data
    Matsumoto, K
    Hashimoto, K
    IEICE TRANSACTIONS ON COMMUNICATIONS, 1998, E81B (12) : 2481 - 2486