AdaWIRL: A Novel Bayesian Ranking Approach for Personal Big-Hit Paper Prediction

被引:3
作者
Zhang, Chuxu [1 ,2 ]
Yu, Lu [3 ]
Lu, Jie [4 ]
Zhou, Tao [5 ]
Zhang, Zi-Ke [1 ]
机构
[1] Hangzhou Normal Univ, Alibaba Res Ctr Complex Sci, Hangzhou, Zhejiang, Peoples R China
[2] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ USA
[3] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
[4] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA
[5] Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu, Peoples R China
来源
WEB-AGE INFORMATION MANAGEMENT, PT II | 2016年 / 9659卷
关键词
INFORMATION; INDEX;
D O I
10.1007/978-3-319-39958-4_27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Predicting the most impactful (big-hit) paper among a researcher's publications so it can be well disseminated in advance not only has a large impact on individual academic success, but also provides useful guidance to the research community. In this work, we tackle the problem of given the corpus of a researcher's publications in previous few years, how to effectively predict which paper will become the big-hit in the future. We explore a series of features that can drive a paper to become the big-hit, and design a novel Bayesian ranking algorithm AdaWIRL (Adaptive Weighted Impact Ranking Learning) that leverages a weighted training schema and an adaptive timely false correction strategy to predict big-hit papers. Experimental results on the large ArnetMiner dataset with over 1.7 million authors and 2 million papers demonstrate the effectiveness of AdaWIRL. Specifically, it correctly predicts over 78.3% of all researchers' big-hit papers and outperforms the compared regression and ranking algorithms, with an average of 5.8% and 2.9% improvement respectively. Further analysis shows that temporal features are the best indicator for personal big-hit papers, while authorship and social features are less relevant. We also demonstrate that there is a high correlation between the impact of a researcher's future works and their similarity to the predicted big-hit paper.
引用
收藏
页码:342 / 355
页数:14
相关论文
共 27 条
  • [1] [Anonymous], 2003, Journal of machine learning research
  • [2] [Anonymous], 2010, P 19 ACM INT C INFOR, DOI [10.1145/1871437.1871517, DOI 10.1145/1871437.1871517]
  • [3] [Anonymous], 2012, P ACM INT C WEB SEAR
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Burges C., 2005, Learning to Rank Using Gradient Descent, P89
  • [6] Cao Z, 2007, LECT NOTES COMPUT SC, V4464, P129
  • [7] Estimating number of citations using author reputation
    Castillo, Carlos
    Donato, Debora
    Gionis, Aristides
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2007, 4726 : 107 - 117
  • [8] Will This Paper Increase Your h-index? Scientific Impact Prediction
    Dong, Yuxiao
    Johnson, Reid A.
    Chawla, Nitesh V.
    [J]. WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2015, : 149 - 158
  • [9] Collaboration Signatures Reveal Scientific Impact
    Dong, Yuxiao
    Johnson, Reid A.
    Yang, Yang
    Chawla, Nitesh V.
    [J]. PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, : 480 - 487
  • [10] An index to quantify an individual's scientific research output
    Hirsch, JE
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (46) : 16569 - 16572