Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification

被引:87
作者
Bernstein, A
Provost, F
Hill, S
机构
[1] Univ Zurich, Dept Informat, CH-8057 Zurich, Switzerland
[2] NYU, Stern Sch Business, New York, NY 10012 USA
关键词
cost-sensitive learning; data mining; data mining process; intelligent assistants; knowledge discovery; knowledge; discovery process; machine learning; metalearning;
D O I
10.1109/TKDE.2005.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition.
引用
收藏
页码:503 / 518
页数:16
相关论文
共 50 条
[21]   An Ontology-Based Text-Mining Method to develop intelligent information system using cluster based approach [J].
Rajput, Komal ;
Kandoi, Narendra .
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, :537-542
[22]   Data Mining to Classify Fog Events by applying Cost-Sensitive Classifier [J].
Zazzaro, Gaetano ;
Pisano, Francesca Maria ;
Mercogliano, Paola .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, :1093-1098
[23]   Ontology-Based Workflow Generation for Intelligent Big Data Analytics [J].
Kumara, Banage T. G. S. ;
Paik, Incheon ;
Zhang, Jia ;
Siriweera, T. H. A. S. ;
Koswatte, R. C. Koswatte .
2015 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2015, :495-502
[24]   Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data [J].
Cheng, Fanyong ;
Zhang, Jing ;
Wen, Cuihong .
PATTERN RECOGNITION LETTERS, 2016, 80 :107-112
[25]   Large cost-sensitive margin distribution machine for imbalanced data classification [J].
Cheng, Fanyong ;
Zhang, Jing ;
Wen, Cuihong ;
Liu, Zhaohua ;
Li, Zuoyong .
NEUROCOMPUTING, 2017, 224 :45-57
[26]   An experiment on an ontology-based support approach for process modeling [J].
Gassen, Jonas Bulegon ;
Mendling, Jan ;
Bouzeghoub, Amel ;
Thom, Lucineia Heloisa ;
de Oliveira, Jose Palazzo M. .
INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 83 :94-115
[27]   An ontology-based framework for the management of machining information in a data mining perspective [J].
Ostermeyer, Emeric ;
Danjou, Christophe ;
Durupt, Alexandre ;
Le Duigou, Julien .
IFAC PAPERSONLINE, 2018, 51 (11) :302-307
[28]   A Researcher Expertise Search System using Ontology-Based Data Mining [J].
Punnarut, Ravikarn ;
Sriharee, Gridaphat .
CONCEPTUAL MODELLING 2010, 2010, :71-78
[29]   Doctor XAI An ontology-based approach to black-box sequential data classification explanations [J].
Panigutti, Cecilia ;
Perotti, Alan ;
Pedreschi, Dino .
FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, :629-639
[30]   Cost-sensitive SVDD models based on a sample selection approach [J].
Zhao, Zhenchong ;
Wang, Xiaodan .
APPLIED INTELLIGENCE, 2018, 48 (11) :4247-4266