EDM: A general framework for data mining based on evidence theory

被引:51
作者
Anand, SS
Bell, DA
Hughes, JG
机构
关键词
data mining; knowledge discovery in databases; uncertainty handling; evidence theory; parallel discovery;
D O I
10.1016/0169-023X(95)00038-T
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data Mining or Knowledge Discovery in Databases [1,15,23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work bring done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets - a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery(1). We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
引用
收藏
页码:189 / 223
页数:35
相关论文
共 50 条
  • [21] The Development of Date Mining System Based on Data mining Algorithms of B/S Framework
    Liu, Hongxia
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONIC TECHNOLOGY, 2015, 6 : 503 - 507
  • [22] Data Dimensionality Reduction Framework for Data Mining
    Danubianu, M.
    Pentiuc, St Gh.
    [J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (04) : 87 - 90
  • [23] Educational Data Mining (EDM) on the use of the Internet in the World of Indonesian Education
    Rahim, Robbi
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2020, 9 (03): : 1134 - 1140
  • [24] The research of data mining approach based on rough set theory
    Zheng, Liying
    Li, Yongchang
    Liu, Liyan
    [J]. INFORMATION, MANAGEMENT AND ALGORITHMS, VOL II, 2007, : 97 - 101
  • [25] XML Data Mining Model based on Rough Set Theory
    Li Weiping
    Yang Jie
    Wang Gang
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3446 - +
  • [26] Optimization of Data Mining in CRM Based on Rough Set Theory
    Jiang Hua
    Cui Zhenxing
    [J]. 2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 3, PROCEEDINGS, 2009, : 252 - +
  • [27] The application of the random fuzzy theory based on fuzzy data mining
    Bai, Shu-yan
    Wang, Yi-lei
    Li, Tao
    [J]. Information, Management and Algorithms, Vol II, 2007, : 165 - 168
  • [28] Data mining in multisensor system based on rough set theory
    Han, B
    Wu, TJ
    [J]. PROCEEDINGS OF THE 2001 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2001, : 4427 - 4431
  • [29] Evidence-based medicine: data mining and pharmacoepidemiology research
    Little, B. B.
    Weideman, R. A.
    Kelly, K. C.
    Cryer, B.
    [J]. DATA MINING VII: DATA, TEXT AND WEB MINING AND THEIR BUSINESS APPLICATIONS, 2006, 37 : 307 - 314
  • [30] Data mining framework based on rough set theory to improve location selection decisions: A case study of a restaurant chain
    Chen, Li-Fei
    Tsai, Chih-Tsung
    [J]. TOURISM MANAGEMENT, 2016, 53 : 197 - 206