Dynamic optimisation based fuzzy association rule mining method

被引:17
作者
Zheng, Hui [1 ,2 ,3 ]
He, Jing [4 ,5 ]
Huang, Guangyan [6 ]
Zhang, Yanchun [3 ,7 ]
Wang, Hua [7 ]
机构
[1] Sci Univ Chinese Acad Sci, CAS Res Ctr Fictitious Econ & Data, Beijing 100190, Peoples R China
[2] Victoria Univ, Melbourne, Vic, Australia
[3] Fudan Univ, Shanghai, Peoples R China
[4] Nanjing Univ Finance & Econ, Inst Informat Technol, Nanjing, Jiangsu, Peoples R China
[5] Victoria Univ, Coll Engn & Sci, Melbourne, Vic, Australia
[6] Deakin Univ, Sch Informat Technol, Melbourne, Vic, Australia
[7] Victoria Univ, Ctr Appl Informat, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Association rule; Optimised parameters; Multiple objective function; Data mining; PERFORMANCE; SENTIMENT; FUZZINESS;
D O I
10.1007/s13042-018-0806-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Techniques of performance analysis, comprising of various metrics such as accuracy, efficiency and consuming time, have been conducted to evaluate the measures of properties and interestingness for the association rule mining method. Therefore, these metrics combined with different parameters (partitioning points, fuzzy sets) should be analysed thoroughly and balanced simultaneously to enhance the entire performance (effectiveness, accuracy and efficiency) for an algorithm. As a result, Most of the current algorithms face the pressure from the tradeoff of these metrics and parameters, which becomes even rougher when we employ it in different resources of data (discrete data, categorical data and continuous data). Specifically, serial data (i.e., sequences or transactions of floating point numbers), such as analysis of sensor streaming data, financial streaming data, medical streaming data and sentimental streaming data, are different from discrete variables, such as boolean data (e.g., sentiment: negative and positive represented as '0' and '1' separately) and categorical data (e.g., 'young age', 'middle age', 'old age'). The main difference is that serial data face sharp boundary's problem. That is, it is hard to decide the boundary values (i.e., the single points to partition data into different value groups), which is few to be solved in association rule mining methods. This paper aims to resolve the problem of sharp boundaries and balance multiple performances of our algorithm simultaneously by developing a novel dynamic optimisation (parameters and metrics) based fuzzy association rule mining (DOFARM) method. The proposed method can be applied in a wide range of classifying problems, such as the classification of sentiment strength (negative and positive). In our DOFARM method, instead of single partitioning points, we use a range of values to smoothly separate two consecutive partitions and develop a corresponding membership function to generate fuzzy sets for original data sets of physical and emotional diseases. Mainly, we design a dual compromise scheme: the first tradeoff balances better performance of out-putting association rules and more widely applicable fuzzy membership function while the second tradeoff reduces the time parameter as well as enhances the entire performance of our DOFARM method. The feasibility and accuracy of DOFARM we proposed have been certified theoretically and experimentally. Besides, we demonstrate the accuracy, effectiveness and efficiency for our DOFARM method by experiments according to both synthesis and real datasets.
引用
收藏
页码:2187 / 2198
页数:12
相关论文
共 28 条
[1]   A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning [J].
Alcala-Fdez, Jesus ;
Alcala, Rafael ;
Herrera, Francisco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2011, 19 (05) :857-872
[2]   Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms [J].
Alcala-Fdez, Jesus ;
Alcala, Rafael ;
Jose Gacto, Maria ;
Herrera, Francisco .
FUZZY SETS AND SYSTEMS, 2009, 160 (07) :905-921
[3]   An integration of Word Net and fuzzy association rule mining for multi-label document clustering [J].
Chen, Chun-Ling ;
Tseng, Frank S. C. ;
Liang, Tyne .
DATA & KNOWLEDGE ENGINEERING, 2010, 69 (11) :1208-1226
[4]  
DECOCK M, 2003, P INT C FUZZ INF PRO, P385
[5]   Fuzzy association rules:: General model and applications [J].
Delgado, M ;
Marín, N ;
Sánchez, D ;
Vila, MA .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2003, 11 (02) :214-225
[6]   Mining association rules with improved semantics in medical databases [J].
Delgado, M ;
Sánchez, D ;
Martín-Bautista, MJ ;
Vila, MA .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2001, 21 (1-3) :241-245
[7]   Synthesis and biological assessment of insulin-like analogs with differential activity at the insulin and IGF-1 receptors [J].
DiMarchi, Richard D. ;
Han, Jie ;
Hoffman, Amy ;
Gelfanov, Vasily M. ;
Kohn, Wayne ;
Micanovic, Radmila ;
Mayer, John P. .
UNDERSTANDING BIOLOGY USING PEPTIDES, 2006, :229-+
[8]  
Dridi A, 2017, INT J MACH LEARN CYB, V1, P11
[9]   A probabilistic method for emerging topic tracking in Microblog stream [J].
Huang, Jiajia ;
Peng, Min ;
Wang, Hua ;
Cao, Jinli ;
Gao, Wang ;
Zhang, Xiuzhen .
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (02) :325-350
[10]   Efficient systematic clustering method for k-anonymization [J].
Kabir, Md. Enamul ;
Wang, Hua ;
Bertino, Elisa .
ACTA INFORMATICA, 2011, 48 (01) :51-66