Normalized Compression Distance of Multisets with Applications

被引:29
作者
Cohen, Andrew R. [1 ]
Vitanyi, Paul M. B. [2 ,3 ]
机构
[1] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA
[2] Natl Res Ctr Math & Comp Sci, Amsterdam, Netherlands
[3] Univ Amsterdam, NL-1098 XG Amsterdam, Netherlands
基金
美国国家卫生研究院;
关键词
Normalized compression distance; multisets or multiples; pattern recognition; data mining; similarity; classification; Kolmogorov complexity; retinal progenitor cells; synthetic data; organelle transport; handwritten character recognition; INFORMATION DISTANCE; TRANSPORT;
D O I
10.1109/TPAMI.2014.2375175
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification questions that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. The theoretic foundation is Kolmogorov complexity.
引用
收藏
页码:1602 / 1614
页数:13
相关论文
共 33 条
  • [1] Missing the forest for the trees:: Phylogenetic compression and its implications for inferring complex evolutionary histories
    Ané, C
    Sanderson, MJ
    [J]. SYSTEMATIC BIOLOGY, 2005, 54 (01) : 146 - 157
  • [2] [Anonymous], 2005, DATA MINING
  • [3] Information distance
    Bennett, CH
    Gacs, P
    Li, M
    Vitanyi, FMB
    Zurek, WH
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (04) : 1407 - 1423
  • [4] Importance of intrinsic mechanisms in cell fate decisions in the developing rat retina
    Cayouette, M
    Barres, BA
    Raff, M
    [J]. NEURON, 2003, 40 (05) : 897 - 904
  • [5] Clustering by compression
    Cilibrasi, R
    Vitányi, PMB
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (04) : 1523 - 1545
  • [6] Ciresan D, 2012, PROC CVPR IEEE, P3642, DOI 10.1109/CVPR.2012.6248110
  • [7] Cohen AR, 2010, NAT METHODS, V7, P213, DOI [10.1038/NMETH.1424, 10.1038/nmeth.1424]
  • [8] Automatic Summarization of Changes in Biological Image Sequences Using Algorithmic Information Theory
    Cohen, Andrew R.
    Bjornsson, Christopher S.
    Temple, Sally
    Banker, Gary
    Roysam, Badrinath
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (08) : 1386 - 1403
  • [9] Gacs P., 1974, SOV MATH DOKLADY, V15, P1477
  • [10] Huntingtin controls neurotrophic support and survival of neurons by enhancing BDNF vesicular transport along microtubules
    Gauthier, LR
    Charrin, BC
    Borrell-Pagès, M
    Dompierre, JP
    Rangone, H
    Cordelières, FP
    De Mey, J
    MacDonald, ME
    Lessmann, V
    Humbert, S
    Saudou, F
    [J]. CELL, 2004, 118 (01) : 127 - 138