Early identification of breakthrough research from sleeping beauties using machine learning

被引:8
作者
Li, Xin [1 ]
Ma, Xiaodi [1 ]
Feng, Ye [1 ]
机构
[1] Beijing Univ Technol, Coll Econ & Management, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Breakthrough research; Machine learning; Sleeping beauties; Delayed recognition; Early identification; DELAYED RECOGNITION; CITATION COUNTS; IMPACT; PREDICTION; SCIENCE; TECHNOLOGY; FEATURES; PRINCES; NUMBER; INDEX;
D O I
10.1016/j.joi.2024.101517
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Breakthrough research is groundbreaking and transformative scientific research that can lead to new frontiers and even trigger substantial changes in the scientific paradigm. Early identification of breakthrough research is crucial for scientists, R&D experts, and policymakers. "Sleeping Beauty in Science" is a category of papers characterized as "delayed recognition", which is considered as the crucial carriers of breakthrough research. Machine learning methods can extract and learn high-quality information from a priori knowledge to predict future trends. In this paper, to address the shortcomings of existing studies on the early identification of breakthrough research, we propose a framework for identifying breakthrough research from sleeping beauties using machine learning. In this framework, we first construct machine learning models to obtain the relationship patterns between historical sleeping beauties and their citation trends. Then, we use these relational patterns to identify potential sleeping beauties. Secondly, we construct a breakthrough index based on the essential features of breakthrough research, then we apply it to identify breakthrough research among potential sleeping beauties, enabling the early identification of breakthrough research. Finally, an empirical study is conducted in the chemistry research field to verify the validity and flexibility of this framework. The results show that the framework can effectively identify breakthrough research from sleeping beauties. This paper contributes to the early identification of breakthrough research, evaluating academic results, and exploring research frontiers. Additionally, it will also provide methodological support for the decision-making of R&D experts and policymakers.
引用
收藏
页数:15
相关论文
共 75 条
[1]   Predicting citation counts based on deep neural network learning techniques [J].
Abrishami, Ali ;
Aliakbary, Sadegh .
JOURNAL OF INFORMETRICS, 2019, 13 (02) :485-499
[2]   Characteristics of highly cited papers [J].
Aksnes, DW .
RESEARCH EVALUATION, 2003, 12 (03) :159-170
[3]   What can university administrators do to increase the publication and citation scores of their faculty members? [J].
Amara, Nabil ;
Landry, Rejean ;
Halilem, Norrin .
SCIENTOMETRICS, 2015, 103 (02) :489-530
[4]   Predicting the citations of scholarly paper [J].
Bai, Xiaomei ;
Zhang, Fuli ;
Lee, Ivan .
JOURNAL OF INFORMETRICS, 2019, 13 (01) :407-418
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]   Identifying "hot papers" and papers with "delayed recognition" in large-scale datasets by using dynamically normalized citation impact scores [J].
Bornmann, Lutz ;
Ye, Adam Y. ;
Ye, Fred Y. .
SCIENTOMETRICS, 2018, 116 (02) :655-674
[7]   How to improve the prediction based on citation impact percentiles for years shortly after the publication date? [J].
Bornmann, Lutz ;
Leydesdorff, Loet ;
Wang, Jian .
JOURNAL OF INFORMETRICS, 2014, 8 (01) :175-180
[8]  
Braun T, 2010, RES EVALUAT, V19, P195, DOI 10.3152/095820210X514210
[9]  
[10]   Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals [J].
Callaham, M ;
Wears, RL ;
Weber, E .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2002, 287 (21) :2847-2850