Decaying Telco Big Data with Data Postdiction

被引:7
作者
Costa, Constantinos [1 ]
Charalampous, Andreas [1 ]
Konstantinidis, Andreas [1 ,2 ]
Zeinalipour-Yazti, Demetrios [1 ]
Mokbel, Mohamed F. [3 ,4 ]
机构
[1] Univ Cyprus, Dept Comp Sci, CY-1678 Nicosia, Cyprus
[2] Frederick Univ, Dept Comp Sci & Engn, CY-1036 Nicosia, Cyprus
[3] HBKU, Qatar Comp Res Inst, Ar Rayyan, Qatar
[4] Univ Minnesota, Minneapolis, MN 55455 USA
来源
2018 19TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2018) | 2018年
关键词
telco; big data; spatio-temporal analytics; data decaying; data reduction; machine learning;
D O I
10.1109/MDM.2018.00027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction). Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. In our experimental setup, we measure the efficiency of the proposed operator using a similar to 10GB anonymized real telco network trace and our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data.
引用
收藏
页码:106 / 115
页数:10
相关论文
共 38 条
[1]   Aggregate Profile Clustering for Telco Analytics [J].
Abbasoglu, Mehmet Ali ;
Gedik, Bugra ;
Ferhatosmanoglu, Hakan .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12) :1234-1237
[2]  
[Anonymous], 2013, P 8 ACM EUR C COMP S
[3]  
[Anonymous], 2012, Proceedings of the 31st symposium on Principles of Database Systems, DOI [10.1145/2213556.2213562, DOI 10.1145/2213556.2213562]
[4]  
Bhattacherjee S., 2014, SSDBM, P1, DOI 10.1145/2618243.2618268
[5]  
Bhattacherjee S, 2015, PROC VLDB ENDOW, V8, P1346, DOI 10.14778/2824032.2824035
[6]   Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications [J].
Bicer, Tekin ;
Yin, Jian ;
Chiu, David ;
Agrawal, Gagan ;
Schuchardt, Karen .
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, :1205-1216
[7]  
Bouillet Eric, 2012, C DISTRIBUTED EVENT, P264, DOI [10.1145/2335484.2335513, DOI 10.1145/2335484.2335513]
[8]   Analytics in Motion High Performance Event-Processing AND Real-Time Analytics in the Same Database [J].
Braun, Lucas ;
Etter, Thomas ;
Gasparis, Georgios ;
Kaufmann, Martin ;
Kossmann, Donald ;
Widmer, Daniel ;
Avitzur, Aharon ;
Iliopoulos, Anthony ;
Levy, Eliezer ;
Liang, Ning .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :251-264
[9]   FPC: A High-Speed Compressor for Double-Precision Floating-Point Data [J].
Burtscher, Martin ;
Ratanaworabhan, Paruj .
IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (01) :18-31
[10]   Optimized stratified sampling for approximate query processing [J].
Chaudhuri, Surajit ;
Das, Gautam ;
Narasayya, Vivek .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (02)