A Data-Driven Knowledge Acquisition System: An End-to-End Knowledge Engineering Process for Generating Production Rules

被引:18
作者
Ali, Maqbool [1 ,2 ]
Ali, Rahman [3 ]
Khan, Wajahat Ali [1 ]
Han, Soyeon Caren [4 ]
Bang, Jaehun [1 ]
Hur, Taeho [1 ]
Kim, Dohyeong [1 ]
Lee, Sungyoung [1 ]
Kang, Byeong Ho [2 ]
机构
[1] Kyung Hee Univ, Dept Comp Sci & Engn, Yongin 446701, South Korea
[2] Univ Tasmania, Sch Engn & ICT, Hobart, Tas 7005, Australia
[3] Univ Peshawar, Quaid E Azam Coll Commerce, Peshawar 25120, Pakistan
[4] Univ Sydney, Sch Informat Technol, Sydney, NSW 2006, Australia
关键词
Knowledge engineering; data mining; features ranking; algorithm selection; decision tree; production rule; user experience; FEATURE-SELECTION ALGORITHMS; USER EXPERIENCE; DATA SCIENCE; CONSTRUCTION;
D O I
10.1109/ACCESS.2018.2817022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data-driven knowledge acquisition is one of the key research fields in data mining. Dealing with large amounts of data has received a lot of attention in the field recently, and a number of methodologies have been proposed to extract insights from data in an automated or semi-automated manner. However, these methodologies generally target a specific aspect of the data mining process, such as data acquisition, data preprocessing, or data classification. However, a comprehensive knowledge acquisition method is crucial to support the end-to-end knowledge engineering process. In this paper, we introduce a knowledge acquisition system that covers all major phases of the cross-industry standard process for data mining. Acknowledging the importance of an end-to-end knowledge engineering process, we designed and developed an easy-to-use data-driven knowledge acquisition tool (DDKAT). The major features of the DDKAT are: (1) a novel unified features scoring approach for data selection; (2) a user-friendly data processing interface to improve the quality of the raw data; (3) an appropriate decision tree algorithm selection approach to build a classification model; and (4) the generation of production rules from various decision tree classification models in an automated manner. Furthermore, two diabetes studies were performed to assess the value of the DDKAT in terms of user experience. A total of 19 experts were involved in the first study and 102 students in the artificial intelligence domain were involved in the second study. The results showed that the overall user experience of the DDKAT was positive in terms of its attractiveness, as well as its pragmatic and hedonic quality factors.
引用
收藏
页码:15587 / 15607
页数:21
相关论文
共 67 条
[61]  
Sharma A., 2012, IJCA, V3, P15
[62]  
Slavkov I, 2010, JMLR WORKSH CONF PRO, V8, P122
[63]  
Stoean R, 2013, ANN UNIV CRAIOVA-MAT, V40, P100
[64]  
Tomar D., 2013, International Journal of Bio-Science and Bio-Technology, V5, P241, DOI 10.14257/ijbsbt.2013.5.5.25
[65]  
Whiteson S, 2005, GECCO 2005: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOLS 1 AND 2, P1225
[66]  
Winston P.H., 1984, Artificial intelligence
[67]   A service oriented architecture to provide data mining services for non-expert data miners [J].
Zorrilla, Marta ;
Garcia-Saiz, Diego .
DECISION SUPPORT SYSTEMS, 2013, 55 (01) :399-411