Efficient Pattern-Based Aggregation on Sequence Data

被引:1
|
作者
He, Zhian [1 ]
Wong, Petrie [2 ]
Kao, Ben [2 ]
Lo, Eric [3 ]
Cheng, Reynold [2 ]
Feng, Ziqiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin, Hong Kong, Peoples R China
关键词
OLAP; iceberg query; sequence databases; probabilistic algorithm;
D O I
10.1109/TKDE.2016.2618856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A Sequence OLAP (S-OLAP) system provides a platform on which pattern-based aggregate (PBA) queries on a sequence database are evaluated. In its simplest form, a PBA query consists of a pattern template T and an aggregate function F. A pattern template is a sequence of variables, each is defined over a domain. Each variable is instantiated with all possible values in its corresponding domain to derive all possible patterns of the template. Sequences are grouped based on the patterns they possess. The answer to a PBA query is a sequence cuboid (s-cuboid), which is a multidimensional array of cells. Each cell is associated with a pattern instantiated from the query's pattern template. The value of each s-cuboid cell is obtained by applying the aggregate function F to the set of data sequences that belong to that cell. Since a pattern template can involve many variables and can be arbitrarily long, the induced s-cuboid for a PBA query can be huge. For most analytical tasks, however, only iceberg cells with very large aggregate values are of interest. This paper proposes an efficient approach to identifying and evaluating iceberg cells of s-cuboids. Experimental results show that our algorithms are orders of magnitude faster than existing approaches.
引用
收藏
页码:286 / 299
页数:14
相关论文
共 50 条
  • [11] DELPHI: A pattern-based method for detecting sequence similarity
    Floratos, A
    Rigoutsos, I
    Parida, L
    Gao, Y
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2001, 45 (3-4) : 455 - 473
  • [12] Pattern-based inference approach for data mining
    Sy, BK
    18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 727 - 731
  • [13] Pattern-based inference approach for data mining
    Sy, Bon K.
    1999,
  • [14] A Pattern-Based Bayesian Classifier for Data Stream
    Yuan, Jidong
    Wang, Zhihai
    Sun, Yange
    Zhang, Wei
    Jiang, Jingjing
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 868 - 877
  • [15] Pattern-Based Keyword Search on RDF Data
    Ouksili, Hanane
    Kedad, Zoubida
    Lopes, Stephane
    Nugier, Sylvaine
    SEMANTIC WEB, ESWC 2016, 2016, 9989 : 30 - 34
  • [16] Efficient Pattern-Based Time Series Classification on GPU
    Chang, Kai-Wei
    Deka, Biplab
    Hwu, Wen-Mei W.
    Roth, Dan
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 131 - 140
  • [17] Efficient pattern-based verification of connections to IP cores
    Polian, I
    Günther, W
    Becker, B
    10TH ASIAN TEST SYMPOSIUM, PROCEEDINGS, 2001, : 443 - 448
  • [18] A New Contrast Pattern-Based Classification for Imbalanced Data
    Chen, Xiangtao
    Gao, Yajing
    Ren, Siqi
    ISCSIC'18: PROCEEDINGS OF THE 2ND INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, 2018,
  • [19] BicPAM: Pattern-based biclustering for biomedical data analysis
    Rui Henriques
    Sara C Madeira
    Algorithms for Molecular Biology, 9
  • [20] BicPAM: Pattern-based biclustering for biomedical data analysis
    Henriques, Rui
    Madeira, Sara C.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9