Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification

被引:87
作者
Bernstein, A
Provost, F
Hill, S
机构
[1] Univ Zurich, Dept Informat, CH-8057 Zurich, Switzerland
[2] NYU, Stern Sch Business, New York, NY 10012 USA
关键词
cost-sensitive learning; data mining; data mining process; intelligent assistants; knowledge discovery; knowledge; discovery process; machine learning; metalearning;
D O I
10.1109/TKDE.2005.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition.
引用
收藏
页码:503 / 518
页数:16
相关论文
共 50 条
[31]   Cost-sensitive SVDD models based on a sample selection approach [J].
Zhenchong Zhao ;
Xiaodan Wang .
Applied Intelligence, 2018, 48 :4247-4266
[32]   A Cost-sensitive Genetic Programming Approach for High-dimensional Unbalanced Classification [J].
Pei, Wenbin ;
Xue, Bing ;
Zhang, Mengjie ;
Shang, Lin .
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, :1770-1777
[33]   Cost-Sensitive Techniques for Fuzzy Rule-Based Pattern Classification [J].
Nakashima, Tomoharu ;
Shoji, Yukio ;
Schaefer, Gerald .
2008 WORLD AUTOMATION CONGRESS PROCEEDINGS, VOLS 1-3, 2008, :191-+
[34]   Swarm-based Cost-sensitive Decision Tree Using Optimized Rules for Imbalanced Data Classification [J].
Mansouri, Mehdi ;
Nadimi-Shahraki, Mohammad H. ;
Beheshti, Zahra .
JOURNAL OF BIONIC ENGINEERING, 2025, 22 (03) :1434-1458
[35]   An interpretable data-driven approach for customer purchase prediction using cost-sensitive learning [J].
Xiao, Fei ;
Chen, Shui-xia ;
Chen, Zi-yu ;
Wang, Ya-nan ;
Wang, Jian-qiang .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[36]   Towards an Ontology-Based Approach to Safety Management in Cooperative Intelligent Transportation Systems [J].
Chen, DeJiu ;
Asplund, Fredrik ;
Ostberg, Kenneth ;
Brezhniev, Eugene ;
Kharchenko, Vyacheslav .
THEORY AND ENGINEERING OF COMPLEX SYSTEMS AND DEPENDABILITY, 2015, 365 :107-115
[37]   Toward an Ontology-based model of key performance indicators for business process improvement [J].
Amor, Emna Ammar El Hadj ;
Ghannouchi, Sonia Ayachi .
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, :148-153
[38]   A genetic algorithm-based approach to cost-sensitive bankruptcy prediction [J].
Chen, Ning ;
Ribeiro, Bernardete ;
Vieira, Armando S. ;
Duarte, Joao ;
Neves, Joao C. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) :12939-12945
[39]   A Domain Generalization Approach Based on Cost-sensitive Learning for Gaze Estimation [J].
Yang, Guobo ;
Zhang, Dong .
2024 12TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTING TECHNOLOGY, ISCTECH, 2024,
[40]   Reinforcement learning-based cost-sensitive classifier for imbalanced fault classification [J].
Xinmin Zhang ;
Saite Fan ;
Zhihuan Song .
Science China Information Sciences, 2023, 66