Regression for citation data: An evaluation of different methods

被引:112
作者
Thelwall, Mike [1 ]
Wilson, Paul [1 ]
机构
[1] Wolverhampton Univ, Sch Math & Comp Sci, Stat Cybermetr Res Grp, Wolverhampton WV1 1LY, W Midlands, England
关键词
Informetrics; Altmetrics; Citation distributions; Lognormal; Powerlaw; Regression; INTERNATIONAL COLLABORATION; SELF-CITATION; IMPACT; ARTICLES; DISTRIBUTIONS; METRICS; COUNTS; NUMBER; CLASSIFICATION; DETERMINANTS;
D O I
10.1016/j.joi.2014.09.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e. g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:963 / 971
页数:9
相关论文
共 76 条
  • [1] National peer-review research assessment exercises for the hard sciences can be a complete waste of money: the Italian case
    Abramo, Giovanni
    Cicero, Tindaro
    D'Angelo, Ciriaco Andrea
    [J]. SCIENTOMETRICS, 2013, 95 (01) : 311 - 324
  • [2] National research assessment exercises: a comparison of peer review and bibliometrics rankings
    Abramo, Giovanni
    D'Angelo, Ciriaco Andrea
    Di Costa, Flavia
    [J]. SCIENTOMETRICS, 2011, 89 (03) : 929 - 941
  • [3] ACUMEN, 2014, GUID GOOD EV PRACT A
  • [4] Altmetric: enriching scholarly content with article-level discussion and metrics
    Adie, Euan
    Roe, William
    [J]. LEARNED PUBLISHING, 2013, 26 (01) : 11 - +
  • [5] AITCHISON J, 1989, BIOMETRIKA, V76, P643
  • [6] Are mobile researchers more productive and cited than non-mobile researchers? A large-scale study of Norwegian scientists
    Aksnes, Dag W.
    Rorstad, Kristoffer
    Piro, Fredrik N.
    Sivertsen, Gunnar
    [J]. RESEARCH EVALUATION, 2013, 22 (04) : 215 - 223
  • [7] Characteristics of highly cited papers
    Aksnes, DW
    [J]. RESEARCH EVALUATION, 2003, 12 (03) : 159 - 170
  • [8] Informetric analyses on the World Wide Web: Methodological approaches to 'webometrics'
    Almind, TC
    Ingwersen, P
    [J]. JOURNAL OF DOCUMENTATION, 1997, 53 (04) : 404 - 426
  • [9] [Anonymous], 1992, An Introduction to Generalized Linear Models, DOI [DOI 10.2307/1269239, 10.2307/1269239]
  • [10] [Anonymous], 1974, The sociology of science: Theoretical and empirical investigations