Data summarization: a survey

被引:44
作者
Ahmed, Mohiuddin [1 ]
机构
[1] Canberra Inst Technol, Dept ICT & Lib Studies, Reid, Australia
关键词
Summarization; Structured data; Unstructured data; Machine learning; Statistics; Semantics; Natural language processing; Cyber security; OUTLIERS; SUPPORT;
D O I
10.1007/s10115-018-1183-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Summarization has been proven to be a useful and effective technique supporting data analysis of large amounts of data. Knowledge discovery from data (KDD) is time consuming, and summarization is an important step to expedite KDD tasks by intelligently reducing the size of processed data. In this paper, different summarization techniques for structured and unstructured data are discussed. The key finding of this survey is that not all summarization techniques create a summary suitable for further analysis. It is highlighted that sampling techniques are a viable way of creating a summary for further knowledge discovery such as anomaly detection from summary. Also different summary evaluation metrics are discussed.
引用
收藏
页码:249 / 273
页数:25
相关论文
共 50 条
  • [41] Trainable Framework for Information Extraction, Structuring and Summarization of Unstructured Data, Using Modified NER
    Banerjee, Partha Sarathy
    Chakraborty, Baisakhi
    Anand, Utkarsh
    Upadhyay, Harsh
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 117 (02) : 769 - 807
  • [42] Trainable Framework for Information Extraction, Structuring and Summarization of Unstructured Data, Using Modified NER
    Partha Sarathy Banerjee
    Baisakhi Chakraborty
    Utkarsh Anand
    Harsh Upadhyay
    Wireless Personal Communications, 2021, 117 : 769 - 807
  • [43] Extractive Summarization Data Sets Generated with Measurable Analyses
    Demir, Irem
    Kupcu, Emel
    Kupcu, Alptekin
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [44] Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint
    Han, Kai
    Cui, Shuang
    Zhu, Tianshuai
    Zhang, Enpei
    Wu, Benwei
    Yin, Zhizhuo
    Xu, Tong
    Tang, Shaojie
    Huang, He
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (01)
  • [45] apricot: Submodular selection for data summarization in Python']Python
    Schreiber, Jacob
    Bilmes, Jeffrey
    Noble, William Stafford
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [46] Mining Eye-Tracking Data for Text Summarization
    Taieb-Maimon, Meirav
    Romanovski-Chernik, Aleksandr
    Last, Mark
    Litvak, Marina
    Elhadad, Michael
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (17) : 4887 - 4905
  • [47] Multi-document Summarization via Deep Learning Techniques: A Survey
    Ma, Congbo
    Zhang, Wei Emma
    Guo, Mingyu
    Wang, Hu
    Sheng, Quan Z.
    ACM COMPUTING SURVEYS, 2023, 55 (05)
  • [48] A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization
    Syed, Ayesha Ayub
    Gaol, Ford Lumban
    Matsuo, Tokuro
    IEEE ACCESS, 2021, 9 : 13248 - 13265
  • [49] Data Summarization in the Node by Parameters (DSNP): Local Data Fusion in an IoT Environment
    Maschi, Luis F. C.
    Pinto, Alex S. R.
    Meneguette, Rodolfo I.
    Baldassin, Alexandro
    SENSORS, 2018, 18 (03)
  • [50] Analysis and summarization of correlations in data cubes and its application in microarray data analysis
    Chen, Chien-Yu
    Hwang, Shien-Ching
    Oyang, Yen-Jen
    INTELLIGENT DATA ANALYSIS, 2005, 9 (01) : 43 - 57