Estimation of KL Divergence: Optimal Minimax Rate

被引:53
作者
Bu, Yuheng [1 ]
Zou, Shaofeng [1 ]
Liang, Yingbin [2 ,3 ]
Veeravalli, Venugopal V. [1 ]
机构
[1] Univ Illinois, ECE Dept, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
[3] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Functional approximation; mean squared error; minimax lower bound; plug-in estimator; polynomial approximation; FUNCTIONALS; ENTROPY;
D O I
10.1109/TIT.2018.2805844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of estimating the Kullback-Leibler divergence D(P parallel to Q) between two unknown distributions P and Q is studied, under the assumption that the alphabet size k of the distributions can scale to infinity. The estimation is based on m independent samples drawn from P and n independent samples drawn from Q. It is first shown that there does not exist any consistent estimator that guarantees asymptotically small worst case quadratic risk over the set of all pairs of distributions. A restricted set that contains pairs of distributions, with density ratio bounded by a function f (k) is further considered. An augmented plug-in estimator is proposed, and its worst case quadratic risk is shown to be within a constant factor of ((k/m) + (k f(k)/n))(2) + (log(2)f (k)/m) + (f (k)/n), if m and n exceed a constant factor of k and k f (k), respectively. Moreover, the minimax quadratic risk is characterized to be within a constant factor of ((k/(mlog k))+(k f (k)/(n log k)))(2) + (log(2)f (k)/m)+(f (k)/n), if m and n exceed a constant factor of k/log(k) and k f (k)/log k, respectively. The lower bound on the minimax quadratic risk is characterized by employing a generalized Le Cam's method. A minimax optimal estimator is then constructed by employing both the polynomial approximation and the plug-in approaches.
引用
收藏
页码:2648 / 2674
页数:27
相关论文
共 31 条
[1]   2-SAMPLE TEST STATISTICS FOR MEASURING DISCREPANCIES BETWEEN 2 MULTIVARIATE PROBABILITY DENSITY-FUNCTIONS USING KERNEL-BASED DENSITY ESTIMATES [J].
ANDERSON, NH ;
HALL, P ;
TITTERINGTON, DM .
JOURNAL OF MULTIVARIATE ANALYSIS, 1994, 50 (01) :41-54
[2]  
[Anonymous], POLYNOMIAL APPROXIMA
[3]  
[Anonymous], 2017, Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis
[4]  
[Anonymous], NEAREST NEIGHBOR INF
[5]  
[Anonymous], 2008, INTRO NONPARAMETRIC
[6]  
[Anonymous], MINIMAX ESTIMATION K
[7]  
[Anonymous], 2017, IEEE T INFORM THEORY, DOI DOI 10.1109/TIT.2017.2733537
[8]   Convergence properties of functional estimates for discrete distributions [J].
Antos, A ;
Kontoyiannis, I .
RANDOM STRUCTURES & ALGORITHMS, 2001, 19 (3-4) :163-193
[9]  
Bu YH, 2016, INT CONF ACOUST SPEE, P4254, DOI 10.1109/ICASSP.2016.7472479
[10]  
Bu YH, 2016, IEEE INT SYMP INFO, P1118, DOI 10.1109/ISIT.2016.7541473