The normalized compression distance is resistant to noise

被引:46
作者
Cebrian, Manuel [1 ]
Alfonseca, Manuel [1 ]
Ortega, Alfonso [1 ]
机构
[1] Univ Autonoma Madrid, Escuela Politecn Super, E-28049 Madrid, Spain
关键词
clustering and noise resistance; datafile corruption; heterogeneous data analysis; Kolmogorov complexity; noisy channel; normalized compression distance; universal similarity distance;
D O I
10.1109/TIT.2007.894669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This correspondence studies the influence of noise on the normalized compression distance (NCD), a measure based on the use of compressors to compute the degree of similarity of two files. This influence is approximated by a first order differential equation which gives rise to a complex effect, which explains the fact that the NCD may give values greater than 1, observed by other authors. The model is tested experimentally with good adjustment. Finally, the influence of noise on the clustering of files of different types is explored, finding that the NCD performs well even in the presence of quite high noise levels.
引用
收藏
页码:1895 / 1900
页数:6
相关论文
共 7 条
[1]   Information distance [J].
Bennett, CH ;
Gacs, P ;
Li, M ;
Vitanyi, FMB ;
Zurek, WH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (04) :1407-1423
[2]   Clustering by compression [J].
Cilibrasi, R ;
Vitányi, PMB .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (04) :1523-1545
[3]  
CILIBRASI R, COMPLEARN TOOLKIT
[4]  
Cover TM, 2006, Elements of Information Theory
[5]  
HART M, GUTEMBERB PROJECT
[6]   The similarity metric [J].
Li, M ;
Chen, X ;
Li, X ;
Ma, B ;
Vitányi, PMB .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (12) :3250-3264
[7]  
Vit?nyi P., 2008, INTRO KOLMOGOROV COM