Feature Selection for Link Prediction

被引:0
作者
Xu, Ye [1 ]
Rockmore, Dan [1 ]
机构
[1] Dartmouth Coll, Dept Comp Sci, Hanover, NH 03755 USA
来源
PROCEEDINGS OF THE 5TH PH.D. WORKSHOP ON INFORMATION AND KNOWLEDGE | 2012年
关键词
Feature Selection; Link Prediction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Networks that model relationships in the real world have attracted much attention in the past few years. Link prediction plays a central role in the network area. Supervised learning is an important class of algorithms used to address the link prediction problem. A big challenge in solving link prediction tasks is deciding how to choose relevant features. As an important machine learning technique to select relevant features, feature selection not only enhances classification accuracy, but also improves the efficiency of the training process. Thus, it is especially relevant for link prediction. However, to the best of our knowledge, feature selection under the link prediction scenario remains unstudied. In this paper, we propose FEature Selection for Link Prediction (FESLP), which contains a feature ranking algorithm and a feature weighting algorithm to address link prediction tasks. We measure the discriminative ability of each individual feature and select those features with greatest discriminative power. Simultaneously, we aim to minimize the correlations among features such that redundancy in the learned feature space is as small as possible. Thus, the feature space can accurately preserve the sketch of the original data. Feature weighting and feature ranking problems can be formalized as two quadratic optimization problems. The active set method is used to solve the feature weighting problem (via real number programming) while a greedy policy is applied to solve the feature ranking problem (via integer programming). In experiments, We evaluate FESLP on six large-scale email network datasets from a university. The results show the effectiveness of the FESLP.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 42 条
[1]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C
[2]  
[Anonymous], 2011, P 20 ACM INT C INF K
[3]  
[Anonymous], 2005, Algorithm Design
[4]  
[Anonymous], 2011, PROC 4 ACM INT C WEB, DOI DOI 10.1145/1935826.1935914
[5]  
[Anonymous], 1997, ICML
[6]  
[Anonymous], 2000, Numerical Optimization
[7]  
[Anonymous], ICDM
[8]  
[Anonymous], 2011, P 20 ACM INT C INF K
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   Valid inequalities for mixed integer linear programs [J].
Cornuejols, Gerard .
MATHEMATICAL PROGRAMMING, 2008, 112 (01) :3-44