Hadith data mining and classification: a comparative analysis

被引:43
作者
Saloot, Mohammad Arshi [1 ]
Idris, Norisma [1 ]
Mahmud, Rohana [1 ]
Ja'afar, Salinah [1 ]
Thorleuchter, Dirk [2 ]
Gani, Abdullah [1 ]
机构
[1] Univ Malaya, Kuala Lumpur 50603, Malaysia
[2] Inst Fraunhofer INT, Appelsgarten 2, D-53879 Euskirchen, Germany
关键词
Review; Comparison; Islamic knowledge; Hadith; Classification; Data mining;
D O I
10.1007/s10462-016-9458-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hadiths are important textual sources of law, tradition, and teaching in the Islamic world. Analyzing the unique linguistic features of Hadiths (e.g. ancient Arabic language and story-like text) results to compile and utilize specific natural language processing methods. In the literature, no study is solely focused on Hadith from artificial intelligence perspective, while many new developments have been overlooked and need to be highlighted. Therefore, this review analyze all academic journal and conference publications that using two main methods of artificial intelligence for Hadith text: Hadith classification and mining. All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy. Using various different Hadith datasets makes a direct comparison between the evaluation results impossible. Therefore, we have re-implemented and evaluated the methods using a single dataset (i.e. 3150 Hadiths from Sahih Al-Bukhari book). The result of evaluation on the classification method reveals that neural networks classify the Hadith with 94% accuracy. This is because neural networks are capable of handling complex (high dimensional) input data. The Hadith mining method that combines vector space model, Cosine similarity, and enriched queries obtains the best accuracy result (i.e. 88%) among other re-evaluated Hadith mining methods. The most important aspect in Hadith mining methods is query expansion since the query must be fitted to the Hadith lingo. The lack of knowledge based methods is evident in Hadith classification and mining approaches and this absence can be covered in future works using knowledge graphs.
引用
收藏
页码:113 / 128
页数:16
相关论文
共 53 条
[1]  
Abuzeina DEM, 2011, UTILIZING DATA DRIVE
[2]   An information-theoretic perspective of tf-idf measures [J].
Aizawa, A .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :45-65
[3]  
Al Kharashi IA, 2002, P 19 INT C COMPUTATI, V17, DOI [10.3115/1072228.1072265, DOI 10.3115/1072228.1072265]
[4]  
Al-Kabi M.N., 2007, U SHARJAH J PURE APP, V4, P13
[5]  
Al-Kabi Mohammed N., 2005, J APPL SCI, V5, P584, DOI [10.3923/jas.2005.584.587, DOI 10.3923/JAS.2005.584.587]
[6]  
Al-tarawneh R, 2014, INT J COMPUT APPL, V95, P1
[7]  
Aldhaln K., 2012, 2012 International Conference on Information Retrieval & Knowledge Management (CAMP), P148, DOI 10.1109/InfRKM.2012.6205024
[8]  
Aldhaln K, 2010, P 3 INT C ICT4M JAK, P21
[9]  
Aldhaln K.A., 2012, P 2 INT C E LEARN KN, P13
[10]   Novel Machnism to Improve Hadith Classifier Performance [J].
Aldhlan, Kawther A. ;
Zeki, Akram M. ;
Zeki, Ahmad M. ;
Alreshidi, Hamad A. .
2012 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2012, :512-517