Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource

被引:10
|
作者
Abeysinghe, Rashmie [1 ]
Cui, Licong [1 ,2 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Univ Kentucky, Inst Biomed Informat, Lexington, KY 40506 USA
基金
美国国家科学基金会;
关键词
Query-constraint-based association rule mining; National sleep research resource; Exploratory data analysis; MYOCARDIAL-INFARCTION; RISK-FACTOR; LOOP DIURETICS; HYPERTENSION; HEART; DEPRESSION; ANXIETY; HEALTH; HYPERCHOLESTEROLEMIA; DISCOVERY;
D O I
10.1186/s12911-018-0633-7
中图分类号
R-058 [];
学科分类号
摘要
Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. Results: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. Conclusions: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.
引用
收藏
页数:12
相关论文
共 6 条
  • [1] Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
    Rashmie Abeysinghe
    Licong Cui
    BMC Medical Informatics and Decision Making, 18
  • [2] Query-constraint-based Association Rule Mining from Diverse Clinical Datasets in the National Sleep Research Resource
    Abeysinghe, Rashmie
    Cui, Licong
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1238 - 1241
  • [3] Research of mining positive and negative weighted association rules based on Chi-squared analysis
    Zhao, Yuan-yuan
    Jiang, He
    ICIC 2009: SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTING SCIENCE, VOL 1, PROCEEDINGS: COMPUTING SCIENCE AND ITS APPLICATION, 2009, : 344 - 347
  • [4] Research on a new automatic generation algorithm of concept map based on text analysis and association rules mining
    Shao, Zengzhen
    Li, Yancong
    Wang, Xiao
    Zhao, Xuechen
    Guo, Yanhui
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (02) : 539 - 551
  • [5] Research on a new automatic generation algorithm of concept map based on text analysis and association rules mining
    Zengzhen Shao
    Yancong Li
    Xiao Wang
    Xuechen Zhao
    Yanhui Guo
    Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 539 - 551
  • [6] Data Mining and Analysis Algorithm of Smart City Network Information Resource Description Framework Based on Fuzzy Association Rules
    Li, Ruihua
    Feng, Zhidong
    Guo, Hongbo
    JOURNAL OF TESTING AND EVALUATION, 2023, 51 (03) : 1386 - 1397