DICO: A Graph-DB Framework for Community Detection on Big Scholarly Data

被引:27
作者
Mercorio, Fabio [1 ]
Mezzazanica, Mario [1 ]
Moscato, Vincenzo [2 ]
Picariello, Antonio [2 ]
Sperli, Giancarlo [2 ]
机构
[1] Univ Milano Bicocca, Dept Stat & Quantitat Methods DISMEQ, I-20126 Milan, Italy
[2] Univ Naples Federico II, Dept Informat Technol & Elect Engn DIETI, I-80138 Naples, Italy
关键词
Big scholarly data; knowledge graphs; semantic network mining; community mining;
D O I
10.1109/TETC.2019.2952765
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The widespread use of online social networks has also involved the scientific field in which researchers interact each other by publishing or citing a given paper. The huge amount of information about scientific research documents has been described through the term Big Scholarly Data. In this article we propose a framework, namely Discovery Information using COmmunity detection (DICO), for identifying overlapped communities of authors from Big Scholarly Data by modeling authors' interactions through a novel graph-based data model combining jointly document metadata with semantic information. In particular, DICO presents three distinctive characteristics: i) the coauthorship network has been built from publication records using a novel approach for estimating relationships weight between users; ii) a new community detection algorithm based on Node Location Analysis has been developed to identify overlapped communities; iii) some built-in queries are provided to browse the generated network, though any graph-traversal query can be implemented through the Cypher query language. The experimental evaluation has been carried out to evaluate the efficacy of the proposed community detection algorithm on benchmark networks. Finally, DICO has been tested on a real-world Big Scholarly Dataset to show its usefulness working on the DBLP+AMiner dataset, that contains 1.7M+ distinct authors, 3M+ papers, handling 25M+ citation relationships.
引用
收藏
页码:1987 / 2003
页数:17
相关论文
共 26 条
[1]   Big Data Research in Italy: A Perspective [J].
Bergamaschi, Sonia ;
Carlini, Emanuele ;
Ceci, Michelangelo ;
Furletti, Barbara ;
Giannotti, Fosca ;
Malerba, Donato ;
Mezzanzanica, Mario ;
Monreale, Anna ;
Pasi, Gabriella ;
Pedreschi, Dino ;
Perego, Raffele ;
Ruggieri, Salvatore .
ENGINEERING, 2016, 2 (02) :163-170
[2]  
Clauset A, 2004, PHYS REV E, V70, DOI 10.1103/PhysRevE.70.066111
[3]   OMEGA - A GENERAL FORMULATION OF THE RAND INDEX OF CLUSTER RECOVERY SUITABLE FOR NON-DISJOINT SOLUTIONS [J].
COLLINS, LM ;
DENT, CW .
MULTIVARIATE BEHAVIORAL RESEARCH, 1988, 23 (02) :231-242
[4]   Academic social networks: Modeling, analysis, mining and applications [J].
Kong, Xiangjie ;
Shi, Yajie ;
Yu, Shuo ;
Liu, Jiaying ;
Xia, Feng .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 132 :86-103
[5]   Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities [J].
Lancichinetti, Andrea ;
Fortunato, Santo .
PHYSICAL REVIEW E, 2009, 80 (01)
[6]  
McDaid A.F., 2011, Normalized Mutual Information to Evaluate Overlapping Community Finding Algorithms
[7]  
Mercorio F, 2019, P EUR C MACH LEARN P
[8]   GraphDBLP: a system for analysing networks of computer scientists through graph databases [J].
Mezzanzanica, Mario ;
Mercorio, Fabio ;
Cesarini, Mirko ;
Moscato, Vincenzo ;
Picariello, Antonio .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (14) :18657-18688
[9]  
Mihalcea R., 2006, P 21 NAT C ART INT, V2006, P775
[10]   Finding and evaluating community structure in networks [J].
Newman, MEJ ;
Girvan, M .
PHYSICAL REVIEW E, 2004, 69 (02) :026113-1