Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs

被引:1
作者
Chen, Ziyang [1 ,2 ]
Zhuang, Junhao [1 ]
Wang, Xuan [1 ]
Tang, Xian [3 ]
Yang, Kun [1 ]
Du, Ming [1 ]
Zhou, Junfeng [1 ]
机构
[1] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China
[2] Shanghai Lixin Univ Accounting & Finance, Sch Informat Management, Shanghai 201620, Peoples R China
[3] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China
关键词
chi-square statistics; probabilistic graph; top-k query; TOOL; ALIGNMENT;
D O I
10.3390/electronics13010192
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom-disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66x and 1.68x in terms of time overhead, it improves by an average of 3.41x.
引用
收藏
页数:25
相关论文
共 31 条
[1]   ChiSeL: Graph Similarity Search using Chi-Squared Statistics in Large Probabilistic Graphs [J].
Agarwal, Shubhangi ;
Dutta, Sourav ;
Bhattacharya, Arnab .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (10) :1654-1668
[2]  
[Anonymous], 2016, International Workshop of Algorithmic Aspects of Cloud Computing
[3]   Efficient Subgraph Matching by Postponing Cartesian Products [J].
Bi, Fei ;
Chang, Lijun ;
Lin, Xuemin ;
Qin, Lu ;
Zhang, Wenjie .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1199-1214
[4]   AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial Networks [J].
Chen, Tianyi ;
Tsourakakis, Charalampos .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :2762-2770
[5]   Efficient Mining of Frequent Patterns on Uncertain Graphs [J].
Chen, Yifan ;
Zhao, Xiang ;
Lin, Xuemin ;
Wang, Yang ;
Guo, Deke .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (02) :287-300
[6]   A (sub)graph isomorphism algorithm for matching large graphs [J].
Cordella, LP ;
Foggia, P ;
Sansone, C ;
Vento, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (10) :1367-1372
[7]   Approximate querying of RDF graphs via path alignment [J].
De Virgilio, Roberto ;
Maccioni, Antonio ;
Torlone, Riccardo .
DISTRIBUTED AND PARALLEL DATABASES, 2015, 33 (04) :555-581
[8]   SCALING SUBGRAPH MATCHING BY IMPROVING ULLMANN ALGORITHM [J].
Gouda, Karam ;
Bujdoso, Gyongyi ;
Hassaan, Mosab .
COMPUTING AND INFORMATICS, 2022, 41 (04) :1002-1024
[9]  
Graph PDD, 2017, PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking
[10]   Subgraph similarity maximal all-matching over a large uncertain graph [J].
Gu, Yu ;
Gao, Chunpeng ;
Wang, Lulu ;
Yu, Ge .
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (05) :755-782