A survey of high utility sequential patterns mining methods

被引:0
作者
Zhang, Ruihua [1 ]
Han, Meng [1 ,2 ]
He, Feifei [1 ]
Meng, Fanxing [1 ]
Li, Chunpeng [1 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan, Ningxia, Peoples R China
[2] State Ethn Affairs Commiss, Key Lab Intelligent Proc Image & Graph, Yinchuan, Ningxia, Peoples R China
基金
中国国家自然科学基金;
关键词
Survey; high utility sequential patterns; incremental data; data streams; hidden patterns; EFFICIENT ALGORITHM; PREFIXSPAN; INTERNET;
D O I
10.3233/JIFS-232107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, there has been an increasing demand for high utility sequential pattern (HUSP) mining. Different from high utility itemset mining, the "combinatorial explosion" problem of sequence data makes it more challenging. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of HUSP from a novel perspective. Firstly, from the perspective of serial and parallel, the data structure used by the mining methods are illustrated and the pros and cons of the algorithms are summarized. In order to protect data privacy, many HUSP hiding algorithms have been proposed, which are classified into array-based, chain-based and matrix-based algorithms according to the key technologies. The hidden strategies and evaluation metrics adopted by the algorithms are summarized. Next, a taxonomy of the most common and the state-of-the-art approaches for incremental mining algorithms is presented, including tree-based and projection-based. In order to deal with the latest sequence in the data stream, the existing algorithms often use the window model to update dynamically, and the algorithms are divided into methods based on sliding windows and landmark windows for analysis. Afterwards, a summary of derived high utility sequential pattern is presented. Finally, aiming at the deficiencies of the existing HUSP research, the next work that the author plans to do is given.
引用
收藏
页码:8049 / 8077
页数:29
相关论文
共 91 条
[1]   Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource [J].
Abeysinghe, Rashmie ;
Cui, Licong .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2018, 18
[2]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[3]  
Agrawal R., 1994, P 20 INT C VER LARG
[4]  
Ahmed Chowdhury Farhan, 2010, Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2010), P76, DOI 10.1109/SNPD.2010.21
[5]   A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo .
ETRI JOURNAL, 2010, 32 (05) :676-686
[6]   CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction [J].
Alkan, Oznur Kirmemis ;
Karagoz, Pinar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) :2645-2657
[7]   A pure array structure and parallel strategy for high-utility sequential pattern mining [J].
Bac Le ;
Ut Huynh ;
Duy-Tai Dinh .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 :107-120
[8]   An efficient algorithm for Hiding High Utility Sequential Patterns [J].
Bac Le ;
Duy-Tai Dinh ;
Van-Nam Huynh ;
Quang-Minh Nguyen ;
Fournier-Viger, Philippe .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 95 :77-92
[9]   A framework for evaluating privacy preserving data mining algorithms [J].
Bertino, E ;
Fovino, IN ;
Provenza, LP .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (02) :121-154
[10]  
Buffett S, 2018, IEEE INT CONF BIG DA, P644, DOI 10.1109/BigData.2018.8622138