How to Normalize Cooccurrence Data? An Analysis of Some Well-Known Similarity Measures

被引:602
作者
van Eck, Nees Jan [1 ]
Waltman, Ludo
机构
[1] Erasmus Univ, Erasmus Sch Econ, Inst Econometr, NL-3000 DR Rotterdam, Netherlands
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2009年 / 60卷 / 08期
关键词
INTERNATIONAL SCIENTIFIC COOPERATION; AUTHOR COCITATION ANALYSIS; NEURAL-NETWORK RESEARCH; INFORMATION-SCIENCE; WORD ANALYSIS; PROXIMITY-MEASURES; ORDERED SETS; MAPS; COLLABORATION; RESEMBLANCE;
D O I
10.1002/asi.21075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In scientometric research, the use of cooccurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this article, we theoretically analyze the properties of similarity measures for cooccurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely, set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that cooccurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research.
引用
收藏
页码:1635 / 1651
页数:17
相关论文
共 103 条
[21]   Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques [J].
Egghe, L ;
Michel, C .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (05) :771-807
[22]   Strong similarity measures for ordered sets of documents in information retrieval [J].
Egghe, L ;
Michel, C .
INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (06) :823-848
[23]   New Relations Between Similarity Measures for Vectors Based on Vector Norms [J].
Egghe, Leo .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (02) :232-239
[24]   A bibliometric analysis of international scientific cooperation of the European Union (1985-1995) [J].
Glänzel, W ;
Schubert, A ;
Czerwon, HJ .
SCIENTOMETRICS, 1999, 45 (02) :185-202
[25]   National characteristics in international scientific co-authorship relations [J].
Glänzel, W .
SCIENTOMETRICS, 2001, 51 (01) :69-115
[26]   Co-citation analysis and the search for invisible colleges:: A methodological evaluation [J].
Gmür, M .
SCIENTOMETRICS, 2003, 57 (01) :27-57
[27]  
Gower J.C., 1985, Encyclopedia of statistical sciences, V5, P397
[28]   METRIC AND EUCLIDEAN PROPERTIES OF DISSIMILARITY COEFFICIENTS [J].
GOWER, JC ;
LEGENDRE, P .
JOURNAL OF CLASSIFICATION, 1986, 3 (01) :5-48
[29]   SIMILARITY MEASURES IN SCIENTOMETRIC RESEARCH - THE JACCARD INDEX VERSUS SALTON COSINE FORMULA [J].
HAMERS, L ;
HEMERYCK, Y ;
HERWEYERS, G ;
JANSSEN, M ;
KETERS, H ;
ROUSSEAU, R ;
VANHOUTTE, A .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) :315-318
[30]  
Hardy G. H., 1952, INEQUALITIES