Evaluating data mining procedures: techniques for generating artificial data sets

被引:22
作者
Scott, PD [1 ]
Wilkins, E [1 ]
机构
[1] Univ Essex, Dept Comp Sci, Colchester CO4 3SQ, Essex, England
基金
英国工程与自然科学研究理事会; 英国经济与社会研究理事会;
关键词
data mining; artificial data sets; pseudo-random generators;
D O I
10.1016/S0950-5849(99)00021-X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we discuss the need to evaluate the performance of data mining procedures and argue that tests done with real data sets cannot provide all the information needed for a thorough assessment of their performance characteristics. We argue that artificial data sets are therefore essential. After a discussion of the desirable characteristics of such artificial data, we describe two pseudo-random generators. The first is based on the multi-variate normal distribution and gives the investigator full control of the degree of correlation between the variables in the artificial data sets. The second is inspired by fractal techniques for synthesizing artificial landscapes and can produce data whose classification complexity can be controlled by a single parameter. We conclude with a discussion of the additional work necessary to achieve the ultimate goal of a method of matching data sets to the most appropriate data mining technique. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:579 / 587
页数:9
相关论文
共 12 条
[1]  
[Anonymous], 1993, C4 5 PROGRAMS MACH L
[2]  
[Anonymous], 1981, SEMINUMERICAL ALGORI
[3]  
Berry MichaelJ., 1997, DATA MINING TECHNIQU
[4]   A NOTE ON THE GENERATION OF RANDOM NORMAL DEVIATES [J].
BOX, GEP ;
MULLER, ME .
ANNALS OF MATHEMATICAL STATISTICS, 1958, 29 (02) :610-611
[5]   Noise modelling and evaluating learning from examples [J].
Hickey, RJ .
ARTIFICIAL INTELLIGENCE, 1996, 82 (1-2) :157-179
[6]  
Mandelbrot BB., 1983, New York, V1st
[7]  
Mitrani I., 1982, SIMULATION TECHNIQUE
[8]  
Morgan B.J.T., 1984, Elements of Simulation
[9]  
Morrison DF, 1967, MULTIVARIATE STAT ME
[10]  
Press WH, 1992, NUMERICAL RECIPES C, V2