The Number of Scholarly Documents on the Public Web

被引:189
作者
Khabsa, Madian [1 ]
Giles, C. Lee [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
来源
PLOS ONE | 2014年 / 9卷 / 05期
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pone.0093949
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
引用
收藏
页数:6
相关论文
共 15 条
  • [1] [Anonymous], 1999, TECH REPORT STANFORD
  • [2] [Anonymous], 2005, IEEE DATA ENG B
  • [3] [Anonymous], 1988, PROBABILITY STAT INF
  • [4] Which h-index? - A comparison of WoS, Scopus and Google Scholar
    Bar-Ilan, Judit
    [J]. SCIENTOMETRICS, 2008, 74 (02) : 257 - 271
  • [6] A technique for measuring the relative size and overlap of public Web search engines
    Bharat, K
    Broder, A
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 379 - 388
  • [7] Open Access to the Scientific Journal Literature: Situation 2009
    Bjork, Bo-Christer
    Welling, Patrik
    Laakso, Mikael
    Majlender, Peter
    Hedlund, Turid
    Guonason, Guoni
    [J]. PLOS ONE, 2010, 5 (06):
  • [8] Björk BC, 2009, INFORM RES, V14
  • [9] Syntactic clustering of the Web
    Broder, AZ
    Glassman, SC
    Manasse, MS
    Zweig, G
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1997, 29 (8-13): : 1157 - 1166
  • [10] Dobra A, 2004, WEB DYNAMICS: ADAPTING TO CHANGE IN CONTENT, SIZE TOPOLOG AND USE, P23