A Data-Driven Knowledge Acquisition System: An End-to-End Knowledge Engineering Process for Generating Production Rules

被引:18
作者
Ali, Maqbool [1 ,2 ]
Ali, Rahman [3 ]
Khan, Wajahat Ali [1 ]
Han, Soyeon Caren [4 ]
Bang, Jaehun [1 ]
Hur, Taeho [1 ]
Kim, Dohyeong [1 ]
Lee, Sungyoung [1 ]
Kang, Byeong Ho [2 ]
机构
[1] Kyung Hee Univ, Dept Comp Sci & Engn, Yongin 446701, South Korea
[2] Univ Tasmania, Sch Engn & ICT, Hobart, Tas 7005, Australia
[3] Univ Peshawar, Quaid E Azam Coll Commerce, Peshawar 25120, Pakistan
[4] Univ Sydney, Sch Informat Technol, Sydney, NSW 2006, Australia
关键词
Knowledge engineering; data mining; features ranking; algorithm selection; decision tree; production rule; user experience; FEATURE-SELECTION ALGORITHMS; USER EXPERIENCE; DATA SCIENCE; CONSTRUCTION;
D O I
10.1109/ACCESS.2018.2817022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data-driven knowledge acquisition is one of the key research fields in data mining. Dealing with large amounts of data has received a lot of attention in the field recently, and a number of methodologies have been proposed to extract insights from data in an automated or semi-automated manner. However, these methodologies generally target a specific aspect of the data mining process, such as data acquisition, data preprocessing, or data classification. However, a comprehensive knowledge acquisition method is crucial to support the end-to-end knowledge engineering process. In this paper, we introduce a knowledge acquisition system that covers all major phases of the cross-industry standard process for data mining. Acknowledging the importance of an end-to-end knowledge engineering process, we designed and developed an easy-to-use data-driven knowledge acquisition tool (DDKAT). The major features of the DDKAT are: (1) a novel unified features scoring approach for data selection; (2) a user-friendly data processing interface to improve the quality of the raw data; (3) an appropriate decision tree algorithm selection approach to build a classification model; and (4) the generation of production rules from various decision tree classification models in an automated manner. Furthermore, two diabetes studies were performed to assess the value of the DDKAT in terms of user experience. A total of 19 experts were involved in the first study and 102 students in the artificial intelligence domain were involved in the second study. The results showed that the overall user experience of the DDKAT was positive in terms of its attractiveness, as well as its pragmatic and hedonic quality factors.
引用
收藏
页码:15587 / 15607
页数:21
相关论文
共 67 条
[1]   A feature selection technique for classificatory analysis [J].
Ahmad, A ;
Dey, L .
PATTERN RECOGNITION LETTERS, 2005, 26 (01) :43-56
[2]  
Ali M., 2018, Fog and Edge Computing (ICFEC), 2018 IEEE 2nd International Conference on, P1, DOI DOI 10.1109/ICCCEEE.2018.8515785
[3]   Accurate multi-criteria decision making methodology for recommending machine learning algorithm [J].
Ali, Rahman ;
Lee, Sungyoung ;
Chung, Tae Choong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 71 :257-278
[4]   Rough set-based approaches for discretization: a compact review [J].
Ali, Rahman ;
Siddiqi, Muhammad Hameed ;
Lee, Sungyoung .
ARTIFICIAL INTELLIGENCE REVIEW, 2015, 44 (02) :235-263
[5]  
Ali SyedImran., 2012, Emerging Technologies (ICET), 2012 International Conference on, P1, DOI DOI 10.1109/ICET.2012.6375420
[6]  
Altidor W, 2011, HANDBOOK OF DATA INTENSIVE COMPUTING, P349, DOI 10.1007/978-1-4614-1415-5_13
[7]  
[Anonymous], 2016, INT S PERC ACT COGN
[8]  
[Anonymous], 2005, DATA MINING
[9]  
[Anonymous], 2000, J DATA WAREHOUSING
[10]  
[Anonymous], KDNUGGETS METHODOLOG