Author disambiguation using multi-aspect similarity indicators

被引:40
作者
Gurney, Thomas [1 ]
Horlings, Edwin [1 ]
van den Besselaar, Peter [2 ]
机构
[1] Rathenau Inst, NL-2593 HW The Hague, Netherlands
[2] Vrije Univ Amsterdam, NL-1081 HV Amsterdam, Netherlands
关键词
Author disambiguation; Precision and recall; Homonyms; Community detection; Data discarding; SCIENCE; DISCLOSURE;
D O I
10.1007/s11192-011-0589-1
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 36 条
[1]   A macro study of self-citation [J].
Aksnes, DW .
SCIENTOMETRICS, 2003, 56 (02) :235-246
[2]  
[Anonymous], SCI ASSESSMENT INTEG
[3]   Authorship criteria and disclosure of contributions - Comparison of 3 general medical journals with different author contribution forms [J].
Bates, T ;
Anic, A ;
Marusic, M ;
Marusic, A .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2004, 292 (01) :86-88
[4]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[5]   Measuring industry-science links through inventor-author relations: A profiling methodology [J].
Cassiman, Bruno ;
Glenisson, Patrick ;
Van Looy, Bart .
SCIENTOMETRICS, 2007, 70 (02) :379-391
[6]   A TECHNIQUE FOR COMPUTER DETECTION AND CORRECTION OF SPELLING ERRORS [J].
DAMERAU, FJ .
COMMUNICATIONS OF THE ACM, 1964, 7 (03) :171-176
[7]  
Do H. H., 2003, WEB WEB SERVICES DAT, V2593, P16
[8]  
Gurney T, 2011, PRO INT CONF SCI INF, P261
[9]  
Han H., 2003, MODEL BASED K MEANS
[10]   AN EXPERIMENT IN SCIENCE MAPPING FOR RESEARCH PLANNING [J].
HEALEY, P ;
ROTHMAN, H ;
HOCH, PK .
RESEARCH POLICY, 1986, 15 (05) :233-251