AdaWIRL: A Novel Bayesian Ranking Approach for Personal Big-Hit Paper Prediction

被引：3

作者：

Zhang, Chuxu ^{[1
,2
]}

Yu, Lu ^{[3
]}

Lu, Jie ^{[4
]}

Zhou, Tao ^{[5
]}

Zhang, Zi-Ke ^{[1
]}

机构：

[1] Hangzhou Normal Univ, Alibaba Res Ctr Complex Sci, Hangzhou, Zhejiang, Peoples R China

[2] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ USA

[3] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China

[4] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA

[5] Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu, Peoples R China

来源：

WEB-AGE INFORMATION MANAGEMENT, PT II | 2016年 / 9659卷

关键词：

INFORMATION; INDEX;

D O I：

10.1007/978-3-319-39958-4_27

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Predicting the most impactful (big-hit) paper among a researcher's publications so it can be well disseminated in advance not only has a large impact on individual academic success, but also provides useful guidance to the research community. In this work, we tackle the problem of given the corpus of a researcher's publications in previous few years, how to effectively predict which paper will become the big-hit in the future. We explore a series of features that can drive a paper to become the big-hit, and design a novel Bayesian ranking algorithm AdaWIRL (Adaptive Weighted Impact Ranking Learning) that leverages a weighted training schema and an adaptive timely false correction strategy to predict big-hit papers. Experimental results on the large ArnetMiner dataset with over 1.7 million authors and 2 million papers demonstrate the effectiveness of AdaWIRL. Specifically, it correctly predicts over 78.3% of all researchers' big-hit papers and outperforms the compared regression and ranking algorithms, with an average of 5.8% and 2.9% improvement respectively. Further analysis shows that temporal features are the best indicator for personal big-hit papers, while authorship and social features are less relevant. We also demonstrate that there is a high correlation between the impact of a researcher's future works and their similarity to the predicted big-hit paper.

引用

页码：342 / 355

页数：14

共 27 条

[1] [Anonymous], 2003, Journal of machine learning research
[2] [Anonymous], 2010, P 19 ACM INT C INFOR, DOI [10.1145/1871437.1871517, DOI 10.1145/1871437.1871517]
[3] [Anonymous], 2012, P ACM INT C WEB SEAR
[4] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[5] Burges C., 2005, Learning to Rank Using Gradient Descent, P89
[6] Cao Z, 2007, LECT NOTES COMPUT SC, V4464, P129
[7] Estimating number of citations using author reputation
Castillo, Carlos
Donato, Debora
Gionis, Aristides
[J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2007, 4726 : 107 - 117
[8] Will This Paper Increase Your h-index? Scientific Impact Prediction
Dong, Yuxiao
Johnson, Reid A.
Chawla, Nitesh V.
[J]. WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2015, : 149 - 158
[9] Collaboration Signatures Reveal Scientific Impact
Dong, Yuxiao
Johnson, Reid A.
Yang, Yang
Chawla, Nitesh V.
[J]. PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, : 480 - 487
[10] An index to quantify an individual's scientific research output
Hirsch, JE
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (46) : 16569 - 16572

← 1 2 3 →