Big data analysis and data velocity

被引:0
作者
Chen, Shimin [1 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2015年 / 52卷 / 02期
关键词
Big data analysis; Data organization and distribution; Data updates; Data warehouse; Event log processing system;
D O I
10.7544/issn1000-1239.2015.20140302
中图分类号
学科分类号
摘要
Big data poses three main challenges to the underlying data management systems: volume (a huge amount of data), velocity (high speed of data generation, data acquisition, and data updates), and variety (a large number of data types and data formats). In this paper, we focus on understanding the significance of velocity and discussing how to face the challenge of velocity in the context of big data analysis systems. We compare the requirements of velocity in transaction processing, data stream, and data analysis systems. Then we describe two of our recent research studies with an emphasis on the role of data velocity in big data analysis systems: 1) MaSM, supporting online data updates in data warehouse systems; 2) LogKV, supporting high-throughput data ingestion and efficient time-window based joins in an event log processing system. Comparing the two studies, we find that storing incoming data updates is only the minimum requirement. We should consider velocity as an integral part of the data acquisition and analysis life cycle. It is important to analyze the characteristics of the desired big data analysis operations, and then to optimize data organization and data distribution schemes for incoming data updates so as to maintain or even improve the efficiency of big data analysis. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:333 / 342
页数:9
相关论文
共 18 条
[1]  
Abadi D., Agrawal R., Ailamaki A., Et al., The beckman report on database research
[2]  
Li G., Cheng X., Research status and scientific thinking of big data, Strategy & Policy Decision Research, 27, 6, pp. 647-657, (2012)
[3]  
EMC digital universe study with research and analysis by IDC, (2014)
[4]  
Athanassoulis M., Chen S., Ailamaki A., Et al., MaSM: Efficient online updates in data warehouses, Proc of the SIGMOD Int Conf on Management of Data, pp. 865-876, (2011)
[5]  
Cao Z., Chen S., Li F., Et al., LogKV: Exploiting Key-Value Stores for Event Log Processing, Proc of the 6th Biennial Conf on Innovative Data Systems Research, (2013)
[6]  
Chang F., Dean J., Ghemawat S., Et al., Bigtable: A distributed storage system for structured data, Proc of USENIX Symp on Operating System Design and Implementation, pp. 205-218, (2006)
[7]  
Decandia G., Hastorun D., Jampani M., Et al., Dynamo: Amazon's highly available key-value store, Proc of ACM Symp on Operating Systems Principles, pp. 205-220, (2007)
[8]  
Lakshman A., Malik P., Cassandra: A decentralized structured storage system, Operating Systems Review, 44, 2, pp. 35-40, (2010)
[9]  
Baker J., Bond C., Corbett J., Et al., Megastore: Providing scalable, highly available storage for interactive services, Proc of the 5th Biennial Conf on Innovative Data Systems Research, pp. 223-234, (2011)
[10]  
Mahmoud H., Arora V., Nawab F., Et al., MaaT: Effective and scalable coordination of distributed transactions in the cloud, PVLDB, 7, 5, pp. 329-340, (2014)