Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods: Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

被引:0
作者
Fahimifar S. [1 ]
Mousavi K. [2 ]
Mozaffari F. [3 ]
Ausloos M. [4 ,5 ,6 ]
机构
[1] Department of Information Science and Knowledge Studies, University of Tehran, Tehran
[2] Faculty of Management, University of Tehran, Tehran
[3] Department of Information Technology Management, University of Tehran, Tehran
[4] School of Business, University of Leicester, Brookfield, Leicester
[5] Department of Statistics and Econometrics, Bucharest University of Economic Studies, Calea Dorobantilor 15-17 Sector 1, Bucharest
[6] GRAPES, Rue de La Belle Jardiniere, Liege Angleur, 483/0021
关键词
Altmetrics; Boruta; Feature selections; Highly cited articles; Lasso; Ridge;
D O I
10.1007/s11135-022-01480-z
中图分类号
学科分类号
摘要
Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of The American Medical Informatics Association indexed in Web of Science (WOS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target (“dependent”) variable (number of citations in WOS). Consequently, 32 variables are selected for the next step. By applying the Ridge technique, 13 features show a positive effect on the number of citations. Using three different algorithms, i.e., Ridge, Lasso, and Boruta, 6 factors appear to be the most relevant ones. The "Number of citations by international researchers", "Journal self-citations in citing documents”, and "Authors' self-citations in citing documents”, are recognized as the most important features by all three methods here used. The "First author's scientific age”, "Open-access paper”, and "Number of first author's citations in WOS" are identified as the important features of highly cited papers by only two methods, Ridge and Lasso. Notice that we use specific machine learning algorithms as feature selection methods (Ridge, Lasso, and Boruta) to identify the most important features of highly cited papers, tools that had not previously been used for this purpose. In conclusion, we re-emphasize the performance resulting from such algorithms. Moreover, we do not advise authors to seek to increase the citations of their articles by manipulating the identified performance features. Indeed, ethical rules regarding these characteristics must be strictly obeyed. © 2022, The Author(s).
引用
收藏
页码:3685 / 3712
页数:27
相关论文
共 104 条
[1]  
Aksnes D.W., Langfeldt L., Wouters P., Citations, citation indicators, and research quality: An overview of basic concepts and theories, SAGE Open, 9, (2019)
[2]  
Aksnes D.W., Characteristics of highly cited papers, Res. Eval., 12, pp. 159-170, (2003)
[3]  
Ale Ebrahim N., Salehi H., Embi M.A., Tanha F.H., Gholizadeh H., Motahar S.M., Ordi A., Effective strategies for increasing citation frequency, Int. Educ. Stud., 6, 93-99, (2013)
[4]  
Alimoradi F., Javadi M., Mohammadpoorasl A., Moulodi F., Hajizadeh M., The effect of key characteristics of the title and morphological features of published articles on their citation rates, Ann. Libr. Inf. Stud., 63, pp. 74-77, (2016)
[5]  
Antonakis J., Bastardoz N., Liu Y., Schriesheim C.A., What makes articles highly cited?, Leadersh. Quat., 25, pp. 152-179, (2014)
[6]  
Antoniou G.A., Antoniou S.A., Georgakarakos E.I., Sfyroeras G.S., Georgiadis G.S., Bibliometric analysis of factors predicting increased citations in the vascular and endovascular literature, Ann. Vasc. Surg., 29, pp. 286-292, (2015)
[7]  
Ausloos M., Lambiotte R., Scharnhorst A., Hellsten I., Andrzej Pȩkalski networks of scientific interests with internal degrees of freedom through self-citation analysis, Int. J. Mod. Phys. C, 19, 3, pp. 371-384, (2008)
[8]  
Aversa E., Citation patterns of highly cited papers and their relationship to literature aging: A study of the working literature, Scientometrics, 7, 3-6, pp. 383-389, (1985)
[9]  
Bauer J., Leydesdorff L., Bornmann L., Highly cited papers in Library and Information Science (LIS): Authors, institutions, and network structures, J. Assoc. Inf. Sci. Technol., 67, pp. 3095-3100, (2016)
[10]  
Bornmann L., Schier H., Marx W., Daniel H.D., What factors determine citation counts of publications in chemistry besides their quality?, J. Informetr., 6, pp. 11-18, (2012)