Statistical ranking and combinatorial Hodge theory

被引:201
作者
Jiang, Xiaoye [2 ]
Lim, Lek-Heng [1 ]
Yao, Yuan [3 ,4 ]
Ye, Yinyu [5 ]
机构
[1] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[2] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[3] Peking Univ, Sch Math Sci, LMAM, Beijing 100871, Peoples R China
[4] Peking Univ, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China
[5] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA
关键词
Statistical ranking; Rank aggregation; Combinatorial Hodge theory; Discrete exterior calculus; Combinatorial Laplacian; Hodge Laplacian; Graph Helmholtzian; HodgeRank; Kemeny optimization; Borda count; ALGORITHM;
D O I
10.1007/s10107-010-0419-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian, which is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the l(2)-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained-if this is large, then it indicates that the data does not have a good global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency in the ranking data arises locally or globally. When applied to statistical ranking problems, Hodge decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature of the inconsistencies. An obvious advantage over the NP-hardness of Kemeny optimization is that HodgeRank may be easily computed via a linear least squares regression. We also discuss connections with well-known ordinal ranking techniques such as Kemeny optimization and Borda count from social choice theory.
引用
收藏
页码:203 / 244
页数:42
相关论文
共 46 条
[41]   A geometric examination of Kemeny's rule [J].
Saari, DG ;
Merlin, VR .
SOCIAL CHOICE AND WELFARE, 2000, 17 (03) :403-438
[42]   SCALING METHOD FOR PRIORITIES IN HIERARCHICAL STRUCTURES [J].
SAATY, TL .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1977, 15 (03) :234-281
[43]   IMPOSSIBILITY OF A PARETIAN LIBERAL [J].
SEN, A .
JOURNAL OF POLITICAL ECONOMY, 1970, 78 (01) :152-157
[44]   CONDORCET THEORY OF VOTING [J].
YOUNG, HP .
AMERICAN POLITICAL SCIENCE REVIEW, 1988, 82 (04) :1231-1244
[45]   CONSISTENT EXTENSION OF CONDORCETS ELECTION PRINCIPLE [J].
YOUNG, HP ;
LEVENGLICK, A .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1978, 35 (02) :285-300
[46]   Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron [J].
Zhang, J .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2004, 48 (02) :107-134