Data summarization: a survey

被引:48
作者
Ahmed, Mohiuddin [1 ]
机构
[1] Canberra Inst Technol, Dept ICT & Lib Studies, Reid, Australia
关键词
Summarization; Structured data; Unstructured data; Machine learning; Statistics; Semantics; Natural language processing; Cyber security; OUTLIERS; SUPPORT;
D O I
10.1007/s10115-018-1183-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Summarization has been proven to be a useful and effective technique supporting data analysis of large amounts of data. Knowledge discovery from data (KDD) is time consuming, and summarization is an important step to expedite KDD tasks by intelligently reducing the size of processed data. In this paper, different summarization techniques for structured and unstructured data are discussed. The key finding of this survey is that not all summarization techniques create a summary suitable for further analysis. It is highlighted that sampling techniques are a viable way of creating a summary for further knowledge discovery such as anomaly detection from summary. Also different summary evaluation metrics are discussed.
引用
收藏
页码:249 / 273
页数:25
相关论文
共 116 条
[81]  
Judea P., 2000, Causality: Models, reasoning, and inference
[82]  
Kamma D., 2013, Proceedings of the 6th India Software Engineering Conference, P91
[83]  
Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
[84]   Self-Adaptive Anytime Stream Clustering [J].
Kranen, Philipp ;
Assent, Ira ;
Baldauf, Corinna ;
Seidl, Thomas .
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, :249-+
[85]  
Kupiec J., 1995, SIGIR Forum, P68
[86]  
Lin C.-Y., 2006, HUMAN LANGUAGE TECHN, P463, DOI DOI 10.3115/1220835.1220894
[87]  
Lin C-Y., 2004, TEXT SUMMARIZATION B, P74
[88]  
Lin Chin-Yew, 1997, ANLP, P283, DOI DOI 10.3115/974557.974599
[89]   Training a selection function for extraction [J].
Lin, CY .
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, :55-62
[90]   THE AUTOMATIC CREATION OF LITERATURE ABSTRACTS [J].
LUHN, HP .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (02) :159-165