An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world's Top 10 computing departments

被引:8
作者
Ali, Nisar [1 ,2 ]
Halim, Zahid [1 ]
Hussain, Syed Fawad [3 ]
机构
[1] Ghulam Ishaq Khan Inst Engn Sci & Technol, Fac Comp Sci & Engn, Machine Intelligence Res Grp MInG, Topi, Pakistan
[2] Univ Regina, Fac Engn & Appl Sci, Regina, SK, Canada
[3] Univ Birmingham, Sch Comp Sci, Birmingham, England
关键词
Scientists ranking; Data driven decision-making; Artificial intelligence; Clustering; Classification; Research output measurement; RANKING AUTHORS; CITATION ANALYSIS; INDEX;
D O I
10.1007/s11192-022-04627-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The total number of published articles and the resulting citations are generally acknowledged as suitable criteria of the scientist's evaluation. However, it is challenging to determine the ranking of scientists as the value of their scientific work (at times) is not directly reflective of the abovementioned aspects. In this regard, multiple other elements needs to be examined in combination for better evaluating the scientific worth of an individual. This work presents a learning-based technique, i.e., an Artificial Intelligence (AI)-based solution towards categorizing scientists utilizing a multifaceted criteria. In this context, a novel ranking metric is proposed which is grounded on authorship, experience, publications count, total citations, i10-index, and h-index. To assess the proposed framework's performance, a dataset is collected considering the world's top ten computing departments and ten domestic ones. This results in a data of 1000 computer scientists. The dataset is preprocessed and afterwards three techniques for feature selection are employed, i.e., Mutual Information (MI), Chi-Square (X-2), and Fisher-Test (F-Test) to rank the features in the data. To validate the collected data, the framework has three clustering techniques as well, namely, k-medoids, k-means, and spectral clustering to identify the optimum number of heterogeneous groups. Three cluster validity indices are used to evaluate the clustering outcomes, namely, Calinski-Harabasz Index (CHI), Davies Bouldin Index (DBI), and Silhouette Coefficient (SC). Once the optimum clusters are obtained, five classification procedures are used, including, Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Linear Regression Classifier (LRC) to predict the category of a previously unknown scientist. Among all classifiers, an average accuracy of 94.44% is shown by the ANN to predict an unknown/new scientist category. The current proposal is also compared with closely related past works. The proposed framework offers the possibility to independently classify scientists based on AI techniques.
引用
收藏
页码:1513 / 1545
页数:33
相关论文
共 53 条
[1]   Ranking authors in academic social networks: a survey [J].
Amjad, Tehmina ;
Daud, Ali ;
Aljohani, Naif Radi .
LIBRARY HI TECH, 2018, 36 (01) :97-128
[2]   The fruits of collaboration in a multidisciplinary field [J].
Bartneck, Christoph ;
Hu, Jun .
SCIENTOMETRICS, 2010, 85 (01) :41-52
[3]   What do we know about the h index? [J].
Bornmann, Lutz ;
Daniel, Hans-Dieter .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (09) :1381-1385
[4]   Ranking authors using fractional counting of citations: An axiomatic approach [J].
Bouyssou, Denis ;
Marchant, Thierry .
JOURNAL OF INFORMETRICS, 2016, 10 (01) :183-199
[5]  
Brin, 1999, PAGERANK CITATION RA
[6]   THE ADEQUACY OF THE SCIENCE CITATION INDEX (SCI) AS AN INDICATOR OF INTERNATIONAL SCIENTIFIC ACTIVITY [J].
CARPENTER, MP ;
NARIN, F .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1981, 32 (06) :430-439
[7]   Support vector ordinal regression [J].
Chu, Wei ;
Keerthi, S. Sathiya .
NEURAL COMPUTATION, 2007, 19 (03) :792-815
[8]  
Connor James., 2011, Google Scholar Citations
[9]  
Dey L., 2016, PREPRINT, DOI DOI 10.48550/ARXIV.1610.09982DHAMDHERE
[10]  
Dhamdhere S., 2018, International Journal of Library and Information Science, V10, P1, DOI DOI 10.5897/IJLIS2017.0797