NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis

被引:57
|
作者
Tang, Yutao [2 ]
Li, Ding [1 ]
Li, Zhichun [1 ]
Zhang, Mu [3 ]
Jee, Kangkook [1 ]
Xiao, Xusheng [4 ]
Wu, Zhenyu [1 ]
Rhee, Junghwan [1 ]
Xu, Fengyuan [5 ]
Li, Qun [2 ]
机构
[1] NEC Labs Amer Inc, Princeton, NJ 08540 USA
[2] Coll William & Mary, Williamsburg, VA 23187 USA
[3] Cornell Univ, Ithaca, NY 14853 USA
[4] Case Western Reserve Univ, Cleveland, OH 44106 USA
[5] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
关键词
Security; Data Reduction;
D O I
10.1145/3243734.3243763
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats (APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.
引用
收藏
页码:1324 / 1337
页数:14
相关论文
共 50 条
  • [41] Efficient Storage of Big-Data for Real-Time GPS Applications
    Akulakrishna, Pavan Kumar
    Lakshmi, J.
    Nandy, S. K.
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 1 - 8
  • [42] BIG-DATA VISUALIZATION FOR TRANSLATIONAL NEUROTRAUMA
    Nielson, Jessica
    Inoue, Tomoo
    Paquette, Jesse
    Lin, Amity
    Sacramento, Jeffrey
    Liu, Aiwen W.
    Guandique, Cristian F.
    Irvine, Karen-Amanda
    Gensel, John C.
    Beattie, Michael S.
    Bresnahan, Jacqueline C.
    Manley, Geoffrey T.
    Carlsson, Gunnar
    Lum, Pek Yee
    Ferguson, Adam R.
    JOURNAL OF NEUROTRAUMA, 2013, 30 (15) : A61 - A62
  • [43] A Minimax Approach for Classification with Big-data
    Krishnan, R.
    Jagannathan, S.
    Samaranayake, V. A.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1437 - 1444
  • [45] Persisting big-data: The NoSQL landscape
    Corbellini, Alejandro
    Mateos, Cristian
    Zunino, Alejandro
    Godoy, Daniela
    Schiaffino, Silvia
    INFORMATION SYSTEMS, 2017, 63 : 1 - 23
  • [46] Big-Data Science: Infrastructure Impact
    Monga, Inder
    Prabhat
    PROCEEDINGS OF THE INDIAN NATIONAL SCIENCE ACADEMY, 2018, 84 (02): : 359 - 370
  • [47] Big-Data Clustering with Genetic Algorithm
    Mortezanezhad, Afsaneh
    Daneshifar, Ebrahim
    2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 702 - 706
  • [48] A happy oyster is a big-data oyster
    Rutkin, Aviva
    NEW SCIENTIST, 2014, 221 (2958) : 23 - 23
  • [49] Big-Data Security Management Issues
    Paryasto, Marisa
    Alamsyah, Andry
    Rahardjo, Budi
    Kuspriyanto
    2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [50] Perspective: Sustaining the big-data ecosystem
    Philip E. Bourne
    Jon R. Lorsch
    Eric D. Green
    Nature, 2015, 527 : S16 - S17