Deep representation learning of scientific paper reveals its potential scholarly impact

被引:5
作者
Jiang, Zhuoren [1 ]
Lin, Tianqianjin [1 ]
Huang, Cui [1 ]
机构
[1] Zhejiang Univ, Sch Publ Affairs, Dept Informat Resources Management, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Scholarly impact; Deep representation learning; Topicality; Originality; CITATION ANALYSIS; PUBLICATIONS; CRITERIA;
D O I
10.1016/j.joi.2023.101376
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Citation and citation-based metrics are traditionally used to quantify the scholarly impact of scientific papers. However, for documents without citation data, i.e., newly published papers, the citation-based metrics are not available. By leveraging deep representation techniques, we propose a text-content based approach that may reveal the scholarly impact of papers without human domain-specific knowledge. Specifically, a large-scale Pre-Trained Model (PTM) with 110 million parameters is utilized to automatically encode the paper into the vector representation. Two indicators, tau (Topicality) and sigma (Originality), are then proposed based on the learned representations. These two indicators leverage the spatial relations of paper representations in the semantic space to capture the impact-related characteristics of a scientific paper. Extensive experiments have been conducted on a COVID-19 open research dataset with 1,056,660 papers. The experimental results demonstrate that the deep representation learning method can better capture the scientific content in the published literature; and the proposed indicators are positively and significantly associated with a paper's potential scholarly impact. In the multivariate regression analysis for the potential impact of a paper, the coefficients of sigma and tau are 5.4915 (P < 0.001) and 6.6879 (P < 0.001) for next 6 months prediction, 12.9964 (P < 0.001) and 13.8678 (P < 0.001) for next 12 months prediction. The proposed framework may facilitate the study of how scholarly impact is generated, from a textual representation perspective.
引用
收藏
页数:16
相关论文
共 82 条
[1]   Scholarly Impact: A Pluralist Conceptualization [J].
Aguinis, Herman ;
Shapiro, Debra L. ;
Antonacopoulou, Elena P. ;
Cummings, Thomas G. .
ACADEMY OF MANAGEMENT LEARNING & EDUCATION, 2014, 13 (04) :623-639
[2]   Scholarly Impact Revisited [J].
Aguinis, Herman ;
Suarez-Gonzalez, Isabel ;
Lannelongue, Gustavo ;
Joo, Harry .
ACADEMY OF MANAGEMENT PERSPECTIVES, 2012, 26 (02) :105-132
[3]   An information-theoretic perspective of tf-idf measures [J].
Aizawa, A .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :45-65
[4]   Early indicators of scientific impact: Predicting citations with altmetrics [J].
Akella, Akhil Pandey ;
Alhoori, Hamed ;
Kondamudi, Pavan Ravikanth ;
Freeman, Cole ;
Zhou, Haiming .
JOURNAL OF INFORMETRICS, 2021, 15 (02)
[5]   Citation rates and perceptions of scientific contribution [J].
Aksnes, DW .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02) :169-185
[6]  
[Anonymous], 1991, RES EVALUAT
[7]  
Aström F, 2002, COLIS4: EMERGING FRAMEWORKS AND METHODS, P185
[8]   Predicting the citations of scholarly paper [J].
Bai, Xiaomei ;
Zhang, Fuli ;
Lee, Ivan .
JOURNAL OF INFORMETRICS, 2019, 13 (01) :407-418
[9]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[10]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828