On entropy-based data mining

被引:32
作者
Holzinger, Andreas [1 ]
Hörtenhuber, Matthias [2 ]
Mayer, Christopher [2 ]
Bachler, Martin [2 ]
Wassertheurer, Siegfried [2 ]
Pinho, Armando J [3 ]
Koslicki, David [4 ]
机构
[1] Medical University Graz, Austria Institute for Medical Informatics, Statistics and Documentation, Research Unit Human-Computer Interaction, Graz
[2] AIT Austrian Institute of Technology GmbH, Health and Environment Department, Biomedical Systems, Donau-City-Str. 1, Vienna
[3] IEETA / Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro
[4] Oregon State University, Mathematics Department, Corvallis, OR
来源
| 1600年 / Springer Verlag卷 / 8401期
关键词
Approximate entropy; Biomedical informatics; Data mining; Entropy; FiniteTopEn; Fuzzy entropy; Knowledge discovery; Sample entropy; Topological entropy;
D O I
10.1007/978-3-662-43968-5_12
中图分类号
学科分类号
摘要
In the real world, we are confronted not only with complex and high-dimensional data sets, but usually with noisy, incomplete and uncertain data, where the application of traditional methods of knowledge discovery and data mining always entail the danger of modeling artifacts. Originally, information entropy was introduced by Shannon (1949), as a measure of uncertainty in the data. But up to the present, there have emerged many different types of entropy methods with a large number of different purposes and possible application areas. In this paper, we briefly discuss the applicability of entropy methods for the use in knowledge discovery and data mining, with particular emphasis on biomedical data. We present a very short overview of the state-of-theart, with focus on four methods: Approximate Entropy (ApEn), Sample Entropy (SampEn), Fuzzy Entropy (FuzzyEn), and Topological Entropy (FiniteTopEn). Finally, we discuss some open problems and future research challenges. © Springer-Verlag Berlin Heidelberg 2014.
引用
收藏
页码:209 / 226
页数:17
相关论文
共 72 条
  • [1] Holzinger A., On knowledge discovery and interactive intelligent visualization of biomedical data-challenges in human computer interaction and biomedical informatics, DATA 2012, 1, pp. 9-20, (2012)
  • [2] Downarowicz T., Entropy in Dynamical Systems, 18, (2011)
  • [3] Shannon C.E., Weaver W., The Mathematical Theory of Communication, (1949)
  • [4] Pincus S.M., Approximate entropy as a measure of system complexity, Proceedings of the National Academy of Sciences, 88, 6, pp. 2297-2301, (1991)
  • [5] Pincus S., Approximate entropy (apen) as a complexity measure, Chaos: An Interdisciplinary Journal of Nonlinear Science, 5, 1, pp. 110-117, (1995)
  • [6] Chandola V., Banerjee A., Kumar V., Anomaly detection: A survey, ACM Comput. Surv, 41, 3, pp. 1-58, (2009)
  • [7] Batini C., Scannapieco M., Data Quality: Concepts, Methodologies and Techniques, (2006)
  • [8] Holzinger A., Simonic K.-M., Information Quality in e-Health, LNCS, 7058, (2011)
  • [9] Kim W., Choi B.J., Hong E.K., Kim S.K., Lee D., A taxonomy of dirty data, Data Mining and Knowledge Discovery, 7, 1, pp. 81-99, (2003)
  • [10] Gschwandtner T., Gartner J., Aigner W., Miksch S., A taxonomy of dirty timeoriented data, CD-ARES 2012. LNCS, 7465, pp. 58-72, (2012)