S3Mining: A model-driven engineering approach for supporting novice data miners in selecting suitable classifiers

被引:9
作者
Espinosa, Roberto [1 ]
Garcia-Saiz, Diego [2 ]
Zorrilla, Marta [2 ]
Jacobo Zubcoff, Jose [3 ]
Mazon, Jose-Norberto [4 ]
机构
[1] Univ Tecnol Chile INACAP, Wake Res Grp, Santiago, Chile
[2] Univ Cantabria, Dept Ingn Informat & Elect, Santander, Spain
[3] Univ Alicante, Wake Res Grp, Dept Ciencias Mar & Biol Aplicada, Alicante, Spain
[4] Univ Alicante, Wake Res Grp, Dept Lenguajes & Sistemas Informat, Inst Univ Invest Informat, Alicante, Spain
关键词
Data mining; Knowledge base; Model-driven engineering; Meta-learning; Novice data miners; Model-driven; SCIENCE; ONTOLOGY; WORKFLOWS; SYSTEM; STATE;
D O I
10.1016/j.csi.2019.03.004
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data mining has proven to be very useful in order to extract information from data in many different contexts. However, due to the complexity of data mining techniques, it is required the know-how of an expert in this field to select and use them. Actually, adequately applying data mining is out of the reach of novice users which have expertise in their area of work, but lack skills to employ these techniques. In this paper, we use both model driven engineering and scientific workflow standards and tools in order to develop named S3Mining framework, which supports novice users in the process of selecting the data mining classification algorithm that better fits with their data and goal. To this aim, this selection process uses the past experiences of expert data miners with the application of classification techniques over their own datasets. The contributions of our S3Mining framework are as follows: (i) an approach to create a knowledge base which stores the past experiences of experts users, (ii) a process that provides the expert users with utilities for the construction of classifiers' recommenders based on the existing knowledge base, (iii) a system that allows novice data miners to use these recommenders for discovering the classifiers that better fit for solving their problem at hand, and (iv) a public implementation of the framework's workflows. Finally, an experimental evaluation has been conducted to shown the feasibility of our framework.
引用
收藏
页码:143 / 158
页数:16
相关论文
共 65 条
[1]   Fusion Cubes: Towards Self-Service Business Intelligence [J].
Abello, Alberto ;
Darmont, Jerome ;
Etcheverry, Lorena ;
Golfarelli, Matteo ;
Mazon, Jose-Norberto ;
Naumann, Felix ;
Pedersen, Torben Bach ;
Rizzi, Stefano ;
Trujillo, Juan ;
Vassiliadis, Panos ;
Vossen, Gottfried .
INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (02) :66-88
[2]  
[Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
[3]  
[Anonymous], 2009, P ECMLPKDD09 WORKSHO
[4]  
Auer S., 2019, SER LIB, V76, P35, DOI [10.1080/0361526X.2019.1540272, DOI 10.1080/0361526X.2019.1540272]
[5]  
Barker A, 2008, LECT NOTES COMPUT SC, V4967, P746
[6]   Intelligent assistance for data pre-processing [J].
Bilalli, Besim ;
Abello, Alberto ;
Aluja-Banet, Tomas ;
Wrembel, Robert .
COMPUTER STANDARDS & INTERFACES, 2018, 57 :101-109
[7]  
Blockeel H, 2007, LECT NOTES ARTIF INT, V4702, P6
[8]  
Brambilla Marco, 2012, Synthesis Lectures on Software Engineering, V1, P1, DOI [DOI 10.2200/S00751ED2V01Y201701SWE004, DOI 10.2200/S00441ED1V01Y201208SWE001]
[9]   Understanding Science 2.0: Crowdsourcing and Open Innovation in the Scientific Method [J].
Buecheler, Thierry ;
Sieg, Jan Henrik .
PROCEEDINGS OF THE 2ND EUROPEAN FUTURE TECHNOLOGIES CONFERENCE AND EXHIBITION 2011 (FET 11), 2011, 7 :327-329
[10]   gff2sequence, a new user friendly tool for the generation of genomic sequences [J].
Camiolo, Salvatore ;
Porceddu, Andrea .
BIODATA MINING, 2013, 6