Investigating the effect of global data on topic detection

被引:27
作者
Boyack, Kevin W. [1 ]
机构
[1] SciTech Strategies Inc, 8421 Manuel Cia Pl NE, Albuquerque, NM 87122 USA
关键词
Global data; Direct citation; Clustering; Cluster characterization; SCIENCE; MAPS;
D O I
10.1007/s11192-017-2297-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset containing 111,616 documents in astronomy and astrophysics (Astroset) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only onethird of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.
引用
收藏
页码:999 / 1015
页数:17
相关论文
共 20 条
  • [1] Archambault É, 2011, PRO INT CONF SCI INF, P66
  • [2] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [3] Design and Update of a Classification System: The UCSD Map of Science
    Boerner, Katy
    Klavans, Richard
    Patek, Michael
    Zoss, Angela M.
    Biberstine, Joseph R.
    Light, Robert P.
    Lariviere, Vincent
    Boyack, Kevin W.
    [J]. PLOS ONE, 2012, 7 (07):
  • [4] Including cited non-source items in a large-scale map of science: What difference does it make?
    Boyack, Kevin W.
    Klavans, Richard
    [J]. JOURNAL OF INFORMETRICS, 2014, 8 (03) : 569 - 580
  • [5] Characterizing the emergence of two nanotechnology topics using a contemporaneous global micro-model of science
    Boyack, Kevin W.
    Klavans, Richard
    Small, Henry
    Ungar, Lyle
    [J]. JOURNAL OF ENGINEERING AND TECHNOLOGY MANAGEMENT, 2014, 32 : 147 - 159
  • [6] Creation of a Highly Detailed, Dynamic, Global Model and Map of Science
    Boyack, Kevin W.
    Klavans, Richard
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (04) : 670 - 685
  • [7] Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
    Emmons, Scott
    Kobourov, Stephen
    Gallant, Mike
    Borner, Katy
    [J]. PLOS ONE, 2016, 11 (07):
  • [8] Using Global Mapping to Create More Accurate Document-Level Maps of Research Fields
    Klavans, Richard
    Boyack, Kevin W.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (01): : 1 - 18
  • [9] OpenOrd: An Open-Source Toolbox for Large Graph Layout
    Martin, Shawn
    Brown, W. Michael
    Klavans, Richard
    Boyack, Kevin W.
    [J]. VISUALIZATION AND DATA ANALYSIS 2011, 2011, 7868
  • [10] Newman MEJ, 2004, PHYS REV E, V69, DOI 10.1103/PhysRevE.69.066133