Applying mutual information for discretization to support the discovery of rare-unusual association rule in cerebrovascular examination dataset

被引:15
作者
Wulandari, Chandrawati Putri [1 ,2 ]
Ou-Yang, Chao [1 ]
Wang, Han-Cheng [3 ,4 ,5 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Ind Management, Taipei, Taiwan
[2] Univ Brawijaya, Dept Informat Syst, Malang, Indonesia
[3] Shin Kong Wu Ho Su Mem Hosp, Dept Neurol, Taipei, Taiwan
[4] Natl Taiwan Univ, Coll Med, Taipei, Taiwan
[5] Taipei Med Univ, Coll Med, Taipei, Taiwan
关键词
Rare-unusual association rules; Discretization; Apriori-Rare; Data mining; Cerebrovascular disease; ISCHEMIC-STROKE; HEART-DISEASE; RISK PATTERNS; CLASSIFICATION; CHOLESTEROL; ALGORITHM;
D O I
10.1016/j.eswa.2018.09.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In knowledge discovery studies, association rules mining has been extensively studied to discover hidden knowledge and relationships among set of items in a transactional dataset. Most research on association rule mining focuses on discovering frequent patterns based on the most frequent items occurring in the dataset. However, the process of extracting rare rules has received less attention. In medical dataset studies, the discovery of rare association rules (RARs) is more challenging, because it could likely be used to obtain more potentially rare and unusual knowledge for physicians, beside frequent association rules. Hence, the aim of this paper is to discover non-frequent or rare-unusual association rules (RUARs) from a stroke medical dataset to provide potential meaningful knowledge to the user domain. A discretization method needs to be performed as the data preprocessing step before generating rules. To the best of our knowledge, fewer studies have focused on the role of discretization results to support the extraction of a better amount and quality of RUARs, particularly for medical datasets. In addition, the extracted RUARs is expected to provide potential new unusual insights on stroke risk patterns. This paper applies mutual information measure to discretize a stroke examination dataset collected from a medical center in Taiwan. The interval merging method was proposed to simplify the discrete form and enrich the quality of generated rules. Towards the end, rare association rules, with relatively low support, were generated by employing the Apriori-Rare method accordingly. In addition, a filtering process was applied to the content of the rule itemsets to discover the expected set of RUARs for physicians. Furthermore, the extracted RUARs was analyzed based on the relative risk values toward the occurrence of stroke. Results indicated that the mutual information discretization outperformed the traditional discretization methods in terms of how the discretization scheme can support the extraction of RUARs with a better quantity and quality measurements for further analysis purpose in medical point of view. Moreover, the proposed method had a relatively higher number of RUARs. The knowledge of unusual rule patterns from rare association rules might provide potential new and unusual insights for medical pratitioners and increase the awareness of stroke examination results. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:52 / 64
页数:13
相关论文
共 59 条
[1]  
Agrawal Rakesh, 1994, P 20 INT C VER LARG, V1215, P487
[2]   Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules [J].
Almasi, Mehrdad ;
Abadeh, Mohammad Saniee .
KNOWLEDGE-BASED SYSTEMS, 2015, 89 :366-384
[3]  
[Anonymous], 2000, SIGMOD, DOI DOI 10.1145/342009.335372
[4]  
[Anonymous], 2005, P KDD
[5]   Cardiovascular risk factors for acute stroke: Risk profiles in the different subtypes of ischemic stroke [J].
Arboix, Adria .
WORLD JOURNAL OF CLINICAL CASES, 2015, 3 (05) :418-429
[6]  
Berzal F., 2002, Intelligent Data Analysis, V6, P221
[7]   Statistics review 11: Assessing risk [J].
Bewick, V ;
Cheek, L ;
Ball, J .
CRITICAL CARE, 2004, 8 (04) :287-291
[8]  
Bhat U. Y., 2014, Int. J. Comput. Appl., V107, P1, DOI [10.5120/18848-9893, DOI 10.5120/18848-9893]
[9]   Mining Undominated Association Rules Through Interestingness Measures [J].
Bouker, Slim ;
Saidi, Rabie ;
Ben Yahia, Sadok ;
Nguifo, Engelbert Mephu .
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2014, 23 (04)
[10]   Cholesterol and the risk of ischemic stroke [J].
Bowman, TS ;
Sesso, HD ;
Ma, J ;
Kurth, T ;
Kase, CS ;
Stampfer, MJ ;
Gaziano, JM .
STROKE, 2003, 34 (12) :2930-2934