Efficient Pattern-Based Aggregation on Sequence Data

被引:1
|
作者
He, Zhian [1 ]
Wong, Petrie [2 ]
Kao, Ben [2 ]
Lo, Eric [3 ]
Cheng, Reynold [2 ]
Feng, Ziqiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin, Hong Kong, Peoples R China
关键词
OLAP; iceberg query; sequence databases; probabilistic algorithm;
D O I
10.1109/TKDE.2016.2618856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A Sequence OLAP (S-OLAP) system provides a platform on which pattern-based aggregate (PBA) queries on a sequence database are evaluated. In its simplest form, a PBA query consists of a pattern template T and an aggregate function F. A pattern template is a sequence of variables, each is defined over a domain. Each variable is instantiated with all possible values in its corresponding domain to derive all possible patterns of the template. Sequences are grouped based on the patterns they possess. The answer to a PBA query is a sequence cuboid (s-cuboid), which is a multidimensional array of cells. Each cell is associated with a pattern instantiated from the query's pattern template. The value of each s-cuboid cell is obtained by applying the aggregate function F to the set of data sequences that belong to that cell. Since a pattern template can involve many variables and can be arbitrarily long, the induced s-cuboid for a PBA query can be huge. For most analytical tasks, however, only iceberg cells with very large aggregate values are of interest. This paper proposes an efficient approach to identifying and evaluating iceberg cells of s-cuboids. Experimental results show that our algorithms are orders of magnitude faster than existing approaches.
引用
收藏
页码:286 / 299
页数:14
相关论文
共 50 条
  • [1] Inference Algorithms for Pattern-Based CRFs on Sequence Data
    Kolmogorov, Vladimir
    Takhanov, Rustem
    ALGORITHMICA, 2016, 76 (01) : 17 - 46
  • [2] Inference Algorithms for Pattern-Based CRFs on Sequence Data
    Vladimir Kolmogorov
    Rustem Takhanov
    Algorithmica, 2016, 76 : 17 - 46
  • [3] Pattern-based data compression
    Kuri, A
    Galaviz, J
    MICAI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 2972 : 1 - 10
  • [4] A Multi-granularity Pattern-based Sequence Classification Framework for Educational Data
    Jaber, Mohammad
    Wood, Peter T.
    Papapetrou, Panagiotis
    Gonzalez-Marcos, Ana
    PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 370 - 378
  • [5] Improved pattern-based encrypted data aggregation scheme for clustered wireless sensor networks
    Wu, Jian-Qiang
    Guo, Jiang-Hong
    DESIGN, MANUFACTURING AND MECHATRONICS (ICDMM 2015), 2016, : 475 - 481
  • [6] Pattern-Based Conceptual Data Modelling
    Albdaiwi, Bader
    Noack, Rene
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXVI, 2014, 272 : 1 - 20
  • [7] Efficient analysis of pattern-based constraint specifications
    Wahler, Michael
    Basin, David
    Brucker, Achim D.
    Koehler, Jana
    SOFTWARE AND SYSTEMS MODELING, 2010, 9 (02): : 225 - 255
  • [8] Efficient analysis of pattern-based constraint specifications
    Michael Wahler
    David Basin
    Achim D. Brucker
    Jana Koehler
    Software & Systems Modeling, 2010, 9 : 225 - 255
  • [9] Efficient Pattern-Based Conceptual Image Retrieval
    Su, Ja-Hwung
    Kuo, Chun-Yi
    Tseng, Vincent S.
    2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 441 - 446
  • [10] Pattern-Based Transformation of Sequence Diagrams Using QVT
    Kim, Dae-Kyoo
    Lee, Byunghun
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1492 - 1497