Scalable Queries For Large Datasets Using Cloud Computing: A Case Study

被引:0
|
作者
McGlothlin, James P. [1 ]
Khan, Latifur [1 ]
机构
[1] Univ Texas Dallas, Richardson, TX 75083 USA
关键词
Cloud Computing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing is rapidly growing in popularity as a solution for processing and retrieving huge amounts of data over clusters of inexpensive commodity hardware. The most common data model utilized by cloud computing software is the NoSQL data model. While this data model is extremely scalable, it is much more efficient for simple retrievals and scans than for the complex analytical queries typical in a relational database model. In this paper, we evaluate emerging cloud computing technologies using a representative use case. Our use case involves analyzing telecommunications logs for performance monitoring and quality assurance. Clearly, the size of such logs is growing exponentially as more devices communicate more frequently and the amount of data being transferred steadily increases. We analyze potential solutions to provide a scalable database which supports both retrieval and analysis. We will investigate and analyze all the major open source cloud computing solutions and designs. We then choose the most applicable subset of these technologies for experimentation. We provide a performance evaluation of these products, and we analyze our results and make recommendations. This paper provides a comprehensive survey of technologies for scalable data processing and an in-depth performance evaluation of these technologies.
引用
收藏
页码:8 / 16
页数:9
相关论文
共 50 条
  • [41] Clustering Datasets in Cloud Computing Environment for User Identification
    Ali, Shallaw Mohammed
    Kecskemeti, Gabor
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 165 - 171
  • [42] Fast, Approximate Vector Queries on Very Large Unstructured Datasets
    Zhang, Zili
    Jin, Chao
    Tang, Linpeng
    Liu, Xuanzhe
    Jin, Xin
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 995 - 1011
  • [43] Teaching genomics to life science undergraduates using cloud computing platforms with open datasets
    Poolman, Toryn M.
    Townsend-Nicholson, Andrea
    Cain, Amanda
    BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION, 2022, 50 (05) : 446 - 449
  • [44] A scalable multi-attribute hybrid overlay for range queries on the cloud
    Lai, Kuan-Chou
    Yu, You-Fu
    INFORMATION SYSTEMS FRONTIERS, 2012, 14 (04) : 895 - 908
  • [45] A scalable multi-attribute hybrid overlay for range queries on the cloud
    Kuan-Chou Lai
    You-Fu Yu
    Information Systems Frontiers, 2012, 14 : 895 - 908
  • [46] A scalable association rule learning heuristic for large datasets
    Li, Haosong
    Sheu, Phillip C-Y
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [47] A scalable association rule learning heuristic for large datasets
    Haosong Li
    Phillip C.-Y. Sheu
    Journal of Big Data, 8
  • [48] Privacy protection and integrity verification of aggregate queries in cloud computing
    Jun Hong
    Tao Wen
    Quan Guo
    Zhengwang Ye
    Ying Yin
    Cluster Computing, 2019, 22 : 5763 - 5773
  • [49] On the deductive security of queries to confidential databases in cloud computing systems
    Varnovsky N.P.
    Zakharov V.A.
    Shokurov A.V.
    Moscow University Computational Mathematics and Cybernetics, 2017, 41 (1) : 38 - 43
  • [50] Privacy protection and integrity verification of aggregate queries in cloud computing
    Hong, Jun
    Wen, Tao
    Guo, Quan
    Ye, Zhengwang
    Yin, Ying
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S5763 - S5773