A Unified Definition of Mutual Information with Applications in Machine Learning

被引：27

作者：

Zeng, Guoping ^{[1
]}

机构：

[1] Elevate, Ft Worth, TX 76109 USA

来源：

MATHEMATICAL PROBLEMS IN ENGINEERING | 2015年 / 2015卷

关键词：

Learning systems - Artificial intelligence - Probability distributions;

D O I：

10.1155/2015/201874

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.

引用

页数：12

共 22 条

[1]

Abramson N., 1963, INFORM THEORY CODING

[2]

[Anonymous], 1960, Information and information stability of random variables and processes

[3]

Ash R. B., 2000, Probability and Measure Theory, V2nd

[4]

Ash R. B., 1965, Information Theory

[5]

Braga Igor, 2014, J INFORM DATA MANAGE, V5, P134

[6]

Cover Thomas M., 2006, Elements of Information Theory, V2nd

[7]

Fano R. M., 1961, Transmission of Information: A Statistical Theory of Communications, DOI DOI 10.1119/1.1937609

[8] Greedy function approximation: A gradient boosting machine [J].

Friedman, JH .

ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232

[9]

Gallager R., 1968, Information theory and reliable communication

[10]

Guyon Isabelle, 2003, Journal of Machine Learning, V3, P1157

← 1 2 3 →