On data mining, compression, and Kolmogorov complexity

被引:43
作者
Faloutsos, Christos
Megalooikonomou, Vasileios
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
基金
美国国家卫生研究院; 美国国家科学基金会; 美国安德鲁·梅隆基金会;
关键词
data mining; compression; Kolmogorov complexity; clustering; classification; forecasting; outliers;
D O I
10.1007/s10618-006-0057-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Will we ever have a theory of data mining analogous to the relational algebra in databases? Why do we have so many clearly different clustering algorithms? Could data mining be automated? We show that the answer to all these questions is negative, because data mining is closely related to compression and Kolmogorov complexity; and the latter is undecidable. Therefore, data mining will always be an art, where our goal will be to find better models (patterns) that fit our datasets as best as possible.
引用
收藏
页码:3 / 20
页数:18
相关论文
共 36 条
[1]  
[Anonymous], 1997, Machine Learning
[2]  
Barnsley M. F., 1998, FRACTALS EVERYWHERE
[3]  
BARNSLEY MF, 1988, BYTE JAN, P215
[4]   A LOCALLY ADAPTIVE DATA-COMPRESSION SCHEME [J].
BENTLEY, JL ;
SLEATOR, DD ;
TARJAN, RE ;
WEI, VK .
COMMUNICATIONS OF THE ACM, 1986, 29 (04) :320-330
[5]  
CHAKRABARTI D, 2004, KDD C SEATTL WA, P78
[6]  
CODD EF, 1971, SIGFIDET WORKSH, P35
[7]  
Cover TM, 2006, Elements of Information Theory
[8]   A TECHNIQUE FOR COMPUTER DETECTION AND CORRECTION OF SPELLING ERRORS [J].
DAMERAU, FJ .
COMMUNICATIONS OF THE ACM, 1964, 7 (03) :171-176
[9]  
ELIAS P, 1975, IEEE T INFORM THEORY, V21, P194, DOI 10.1109/TIT.1975.1055349
[10]   A GRAPH DISTANCE MEASURE FOR IMAGE-ANALYSIS [J].
ESHERA, MA ;
FU, KS .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1984, 14 (03) :398-408