Investigating the effect of global data on topic detection

被引:27
作者
Boyack, Kevin W. [1 ]
机构
[1] SciTech Strategies Inc, 8421 Manuel Cia Pl NE, Albuquerque, NM 87122 USA
关键词
Global data; Direct citation; Clustering; Cluster characterization; SCIENCE; MAPS;
D O I
10.1007/s11192-017-2297-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset containing 111,616 documents in astronomy and astrophysics (Astroset) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only onethird of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.
引用
收藏
页码:999 / 1015
页数:17
相关论文
共 20 条
[1]  
Archambault É, 2011, PRO INT CONF SCI INF, P66
[2]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[3]   Design and Update of a Classification System: The UCSD Map of Science [J].
Boerner, Katy ;
Klavans, Richard ;
Patek, Michael ;
Zoss, Angela M. ;
Biberstine, Joseph R. ;
Light, Robert P. ;
Lariviere, Vincent ;
Boyack, Kevin W. .
PLOS ONE, 2012, 7 (07)
[4]   Including cited non-source items in a large-scale map of science: What difference does it make? [J].
Boyack, Kevin W. ;
Klavans, Richard .
JOURNAL OF INFORMETRICS, 2014, 8 (03) :569-580
[5]   Characterizing the emergence of two nanotechnology topics using a contemporaneous global micro-model of science [J].
Boyack, Kevin W. ;
Klavans, Richard ;
Small, Henry ;
Ungar, Lyle .
JOURNAL OF ENGINEERING AND TECHNOLOGY MANAGEMENT, 2014, 32 :147-159
[6]   Creation of a Highly Detailed, Dynamic, Global Model and Map of Science [J].
Boyack, Kevin W. ;
Klavans, Richard .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (04) :670-685
[7]   Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale [J].
Emmons, Scott ;
Kobourov, Stephen ;
Gallant, Mike ;
Borner, Katy .
PLOS ONE, 2016, 11 (07)
[8]   Using Global Mapping to Create More Accurate Document-Level Maps of Research Fields [J].
Klavans, Richard ;
Boyack, Kevin W. .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (01) :1-18
[9]   OpenOrd: An Open-Source Toolbox for Large Graph Layout [J].
Martin, Shawn ;
Brown, W. Michael ;
Klavans, Richard ;
Boyack, Kevin W. .
VISUALIZATION AND DATA ANALYSIS 2011, 2011, 7868
[10]  
Newman MEJ, 2004, PHYS REV E, V69, DOI 10.1103/PhysRevE.69.066133