Fraud detection in the distributed graph database

被引:0
作者
Sakshi Srivastava
Anil Kumar Singh
机构
[1] MNNIT Allahabad,
来源
Cluster Computing | 2023年 / 26卷
关键词
Neo4j; Graph database; Distributed system; Community fraud detection; Node rank; Minimized graph; Data profiling; NRFD;
D O I
暂无
中图分类号
学科分类号
摘要
Over the last few decades, graphs have become increasingly important in many applications and domains for managing Big data. Big data analysis in a graph database is described as an analysis of exponentially increasing massive interconnected data concerning time. However, analyzing big connected data in social networks and synthetic identity detection is challenging. In previous approaches, fraud detection has been done on the complete graph data, which is a time-consuming process and will create bottlenecks while query execution. To overcome the issue, this paper proposes a new fraud detection technique to unveil synthetic identities involved in the Panama Paper leak dataset (unprecedented leak of 11.5 m data from the database of the world’s fourth-biggest offshore law arm, Mossack Fonseca) using a Node rank-based fraud detection algorithm by integrating distributed data profiling techniques on a minimized graph by minimizing the least influential nodes. The proposed model is verified on the three nodes cluster to improve data scalability, reduce the query execution time by an average of 30–36% and finally reduce the fraud detection time by 18.2%.
引用
收藏
页码:515 / 537
页数:22
相关论文
共 39 条
  • [1] Chen DB(2013)Identifying influential nodes in large-scale directed networks: the role of clustering PloS one 8 1343-1356
  • [2] Gao H(2017)Fraud detection using fraud triangle risk factors Inf. Syst. Front. 19 1-30
  • [3] Lü L(2017)An influence propagation view of pagerank ACM Trans. Knowl. Discov. Data (TKDD) 11 1529-1553
  • [4] Zhou T(2020)Graph pattern matching with counting quantifiers and label-repetition constraints Clust. Comput. 23 1-27
  • [5] Huang SY(2020)Data profiling in property graph databases J. Data Inform. Qual. (JDIQ) 12 2211-2232
  • [6] Lin CC(2018)Efficient query retrieval from social data in neo4j using lindex KSII Trans. Internet Inform. Syst. (TIIS) 12 4117-4155
  • [7] Chiu AA(2019)The value of offshore secrets: Evidence from the Panama Papers The Review of Financial Studies 32 113303-14
  • [8] Yen DC(2020)Fraud detection: a systematic literature review of graph-based anomaly detection approaches Decis. Support Syst. 133 1-1575
  • [9] Liu Q(2021)Ranking influential nodes in complex networks based on local and global structures Appl. Intell. 14 1555-39
  • [10] Xiang B(2020)Parallel processing of spatial batch-queries using xBR+-trees in solid-state drives Clust. Comput. 23 31-995