The minimum description length principle in coding and modeling

被引:612
作者
Barron, A [1 ]
Rissanen, J
Yu, B
机构
[1] Yale Univ, Dept Stat, New Haven, CT 06520 USA
[2] IBM Corp, Almaden Res Ctr, Div Res, San Jose, CA 95120 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
complexity; compression; estimation; inference; universal modeling;
D O I
10.1109/18.720554
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.
引用
收藏
页码:2743 / 2760
页数:18
相关论文
共 40 条