Fast correlation coefficient estimation algorithm for HBase-based massive time series data

被引:1
|
作者
Liu, Wen [1 ,2 ]
Zhang, Tuqian [2 ]
Shen, Yanming [2 ]
Wang, Peng [3 ]
机构
[1] Xinjiang Inst Engn, Dept Elect & Informat Engn, Urumqi 830091, Peoples R China
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[3] Fudan Univ, Sch Comp Sci, Shanghai 201203, Peoples R China
关键词
time series; HBase; correlation coefficient; fast estimation;
D O I
10.1007/s11704-018-6308-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurement method, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.
引用
收藏
页码:864 / 878
页数:15
相关论文
共 50 条
  • [1] Fast correlation coefficient estimation algorithm for HBase-based massive time series data
    Wen Liu
    Tuqian Zhang
    Yanming Shen
    Peng Wang
    Frontiers of Computer Science, 2019, 13 : 864 - 878
  • [2] A Fast Algorithm of Correlation Dimension Estimation for Nonlinear Time Series
    Fan, Zhenyan
    Dong, Shumin
    Chi, Jieru
    Zhuang, Xiaodong
    Mastorakis, Nikos E.
    2018 2ND EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (EECS 2018), 2018, : 595 - 597
  • [3] An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval
    Zhu, Chengzhang
    Liu, Zixi
    Zou, Beiji
    Xiao, Yalong
    Zeng, Meng
    Wang, Han
    Fan, Ziang
    ELECTRONICS, 2023, 12 (04)
  • [4] A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT
    Li, Haibo
    Tong, Juncheng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2019, 16 (06) : 6654 - 6671
  • [5] Wavelet based correlation coefficient of time series of Saudi Meteorological Data
    Rehman, S.
    Siddiqi, A. H.
    CHAOS SOLITONS & FRACTALS, 2009, 39 (04) : 1764 - 1789
  • [6] Estimation of the correlation coefficient based on selected data
    Hägglund, Gosta
    Larsson, Rolf
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2006, 31 (04) : 377 - 411
  • [7] Massive AIS Data Management Based on HBase and Spark
    Qin, Jiwei
    Ma, Liangli
    Niu, Jinghua
    2018 3RD ASIA-PACIFIC CONFERENCE ON INTELLIGENT ROBOT SYSTEMS (ACIRS 2018), 2018, : 112 - 117
  • [8] A new correlation coefficient for bivariate time-series data
    Erdem, Orhan
    Ceyhan, Elvan
    Varli, Yusuf
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2014, 414 : 274 - 284
  • [9] Narrowband time delay estimation based on correlation coefficient
    Gao Yang
    Qiu Tianshuang
    Sha Lan
    Zhao Yanbin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (05) : 937 - 941