Data summarization: a survey

被引:48
作者
Ahmed, Mohiuddin [1 ]
机构
[1] Canberra Inst Technol, Dept ICT & Lib Studies, Reid, Australia
关键词
Summarization; Structured data; Unstructured data; Machine learning; Statistics; Semantics; Natural language processing; Cyber security; OUTLIERS; SUPPORT;
D O I
10.1007/s10115-018-1183-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Summarization has been proven to be a useful and effective technique supporting data analysis of large amounts of data. Knowledge discovery from data (KDD) is time consuming, and summarization is an important step to expedite KDD tasks by intelligently reducing the size of processed data. In this paper, different summarization techniques for structured and unstructured data are discussed. The key finding of this survey is that not all summarization techniques create a summary suitable for further analysis. It is highlighted that sampling techniques are a viable way of creating a summary for further knowledge discovery such as anomaly detection from summary. Also different summary evaluation metrics are discussed.
引用
收藏
页码:249 / 273
页数:25
相关论文
共 116 条
[1]  
Aggarwal C.C., 2007, Data streams: models and algorithms, P169, DOI [10.1007/978-0-387-47534-9_9, DOI 10.1007/978-0-387-47534-9_9]
[2]  
Aggarwal C. C., 2006, P 32 INT C VER LARG, P607
[3]   An effective and efficient algorithm for high-dimensional outlier detection [J].
Aggarwal, CC ;
Yu, PS .
VLDB JOURNAL, 2005, 14 (02) :211-221
[4]  
Aggarwal Charu C, 2007, Data Streams: Models and Algorithms, V31
[5]  
Ahmed M., 2014, The state of the art in intrusion prevention and detection, P3
[6]   A survey of network anomaly detection techniques [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Hu, Jiankun .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 60 :19-31
[7]   A survey of anomaly detection techniques in financial domain [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Islam, Md. Rafiqul .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :278-288
[8]  
Ahmed M, 2014, C IND ELECT APPL, P1780, DOI 10.1109/ICIEA.2014.6931456
[9]   Patch clustering for massive data sets [J].
Alex, Nikolai ;
Hasenfuss, Alexander ;
Hammer, Barbara .
NEUROCOMPUTING, 2009, 72 (7-9) :1455-1469
[10]  
Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823