A General Suspiciousness Metric for Dense Blocks in Multimodal Data

被引:51
作者
Jiang, Meng [1 ]
Beutel, Alex [2 ]
Cui, Peng [1 ]
Hooi, Bryan [2 ]
Yang, Shiqiang [1 ]
Faloutsos, Christos [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2015年
关键词
D O I
10.1109/ICDM.2015.61
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Which seems more suspicious: 5,000 tweets from 200 users on 5 IP addresses, or 10,000 tweets from 500 users on 500 IP addresses but all with the same trending topic and all in 10 minutes? The literature has many methods that try to find dense blocks in matrices, and, recently, tensors, but no method gives a principled way to score the suspiciouness of dense blocks with different numbers of modes and rank them to draw human attention accordingly. Dense blocks are worth inspecting, typically indicating fraud, emerging trends, or some other noteworthy deviation from the usual. Our main contribution is that we show how to unify these methods and how to give a principled answer to questions like the above. Specifically, (a) we give a list of axioms that any metric of suspicousness should satisfy; (b) we propose an intuitive, principled metric that satisfies the axioms, and is fast to compute; (c) we propose CROSSSPOT, an algorithm to spot dense regions, and sort them in importance ("suspiciousness") order. Finally, we apply CROSSSPOT to real data, where it improves the F1 score over previous techniques by 68% and finds retweet-boosting in a real social dataset spanning 0.3 billion posts.
引用
收藏
页码:781 / 786
页数:6
相关论文
共 16 条
  • [1] A Local Algorithm for Finding Dense Subgraphs
    Andersen, Reid
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2010, 6 (04)
  • [2] [Anonymous], 2004, VLDB
  • [3] [Anonymous], 2007, Proceedings of the 16th international conference on World Wide Web
  • [4] Greedily finding a dense subgraph
    Asahiro, Y
    Iwama, K
    Tamaki, H
    Tokuyama, T
    [J]. JOURNAL OF ALGORITHMS, 2000, 34 (02) : 203 - 221
  • [5] Beutel A., 2013, P 22 INT C WORLD WID, P119, DOI DOI 10.1145/2488388.2488400
  • [6] Charikar M., 2000, Approximation Algorithms for Combinatorial Optimization. Third International Workshop, APPROX 2000. Proceedings (Lecture Notes in Computer Science Vol.1913), P84
  • [7] Dense Subgraph Extraction with Application to Community Detection
    Chen, Jie
    Saad, Yousef
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (07) : 1216 - 1230
  • [8] A multilinear singular value decomposition
    De Lathauwer, L
    De Moor, B
    Vandewalle, J
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) : 1253 - 1278
  • [9] Huang H., 2008, P 14 ACM SIGKDD INT, P327, DOI [10.1145/1401890.1401933, DOI 10.1145/1401890.1401933]
  • [10] Inah J., 2015, ICDE