Continuously Differentiable Sample-Spacing Entropy Estimation

被引：10

作者：

Ozertem, Umut ^{[1
]}

Uysal, Ismail ^{[2
]}

Erdogmus, Deniz ^{[3
]}

机构：

[1] Yahoo Inc, Sunnyvale, CA 95054 USA

[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

[3] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS | 2008年 / 19卷 / 11期

基金：

美国国家科学基金会;

关键词：

Entropy estimation; minimum error entropy (MEE) criterion; supervised neural network training;

D O I：

10.1109/TNN.2008.2006167

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The insufficiency of using only second-order statistics and premise of exploiting higher order statistics of the data has been well understood, and more advanced objectives including higher order statistics, especially those stemming from information theory, such as error entropy minimization, are now being studied and applied in many contexts of machine learning and signal processing. In the adaptive system training context, the main drawback of utilizing output error entropy as compared to correlation-estimation-based second-order statistics is the computational load of the entropy estimation, which is usually obtained via a plug-in kernel estimator. Sample-spacing estimates offer computationally inexpensive entropy estimators; however, resulting estimates are not differentiable, hence, not suitable for gradient-based adaptation. In this brief paper, we propose a nonparametric entropy estimator that captures the desirable properties of both approaches. The resulting estimator yields continuously differentiable estimates with a computational complexity at the order of those of the sample-spacing techniques. The proposed estimator is compared with the kernel density estimation (KDE)-based entropy estimator in the supervised neural network training framework with computation time and performance comparisons.

引用

页码：1978 / 1984

页数：7

共 21 条

[1]

[Anonymous], 2002, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications

[2]

[Anonymous], 1973, LINEAR NONLINEAR PRO

[3]

BERILANT J, 1997, INT J MATH STAT SCI, V6, P17

[4] An algorithm for data-driven bandwidth selection [J].

Comaniciu, D .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (02) :281-288

[5]

Cover TM., 2006, Elements of information theory, DOI [10.1002/047174882X.ch2,arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/047174882X.ch2, DOI 10.1002/047174882X]

[6]

Devroye L., 2001, Combinatorial Methods in Density Estimation, Springer Series in Statistics

[7] Generalized information potential criterion for adaptive system training [J].

Erdogmus, D ;

Principe, JC .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (05) :1035-1044

[8] An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems [J].

Erdogmus, D ;

Principe, JC .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (07) :1780-1786

[9]

GERSHENFELD NA, 1994, SFI S SCI C, V15, P1

[10]

Haykin S., 1999, Neural networks: a comprehensive foundation, V2nd ed.

← 1 2 3 →