Ground truth? Concept-based communities versus the external classification of physics manuscripts

被引:10
作者
Palchykov, Vasyl [1 ,2 ]
Gemmetto, Valerio [1 ]
Boyarsky, Alexey [1 ]
Garlaschelli, Diego [1 ]
机构
[1] Leiden Univ, Lorentz Inst Theoret Phys, Niels Bohrweg 2, NL-2333 CA Leiden, Netherlands
[2] Inst Condensed Matter Phys, Svientsitskii Str 1, UA-79011 Lvov, Ukraine
关键词
science of science; community detection; bipartite networks; INFORMATION; CITATION;
D O I
10.1140/epjds/s13688-016-0090-4
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is converted to a network. In this article we systematically show that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which (i) exhibit a well defined modular structure and (ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite projection, we applied modularity optimization to uncover the inner thematic structure. The resulting clusters are shown to partly reflect the author-made classification, although some significant discrepancies are observed. A detailed analysis of these discrepancies shows that they may carry essential information about the system, mainly related to the use of similar techniques and methods across different (sub) disciplines, that is otherwise omitted when only the external classification is considered.
引用
收藏
页数:11
相关论文
共 34 条
[1]  
[Anonymous], 2006, P 12 ACM SIGKDD INT
[2]   Modularity and community detection in bipartite networks [J].
Barber, Michael J. .
PHYSICAL REVIEW E, 2007, 76 (06)
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[5]   Clustering attributed graphs: Models, measures and methods [J].
Bothorel, Cecile ;
Cruz, Juan David ;
Magnani, Matteo ;
Micenkova, Barbora .
NETWORK SCIENCE, 2015, 3 (03) :408-444
[6]   Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches [J].
Boyack, Kevin W. ;
Newman, David ;
Duhon, Russell J. ;
Klavans, Richard ;
Patek, Michael ;
Biberstine, Joseph R. ;
Schijvenaars, Bob ;
Skupin, Andre ;
Ma, Nianli ;
Boerner, Katy .
PLOS ONE, 2011, 6 (03)
[7]   Co-Citation Analysis, Bibliographic Coupling, and Direct Citation: Which Citation Approach Represents the Research Front Most Accurately? [J].
Boyack, Kevin W. ;
Klavans, Richard .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (12) :2389-2404
[8]   Complex brain networks: graph theoretical analysis of structural and functional systems [J].
Bullmore, Edward T. ;
Sporns, Olaf .
NATURE REVIEWS NEUROSCIENCE, 2009, 10 (03) :186-198
[9]   Community structure of the physical review citation network [J].
Chen, P. ;
Redner, S. .
JOURNAL OF INFORMETRICS, 2010, 4 (03) :278-290
[10]   Uncovering space-independent communities in spatial networks [J].
Expert, Paul ;
Evans, Tim S. ;
Blondel, Vincent D. ;
Lambiotte, Renaud .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (19) :7663-7668