Neighborhood Information-Based Method for Multivariate Association Mining

被引:2
作者
Cheng, Honghong [1 ,2 ]
Qian, Yuhua [3 ]
Guo, Yingjie [3 ]
Zheng, Keyin [3 ]
Zhang, Qingfu [4 ,5 ]
机构
[1] Shanxi Univ Finance & Econ, Sch Informat, Taiyuan 030012, Shanxi, Peoples R China
[2] Shanxi Univ, Inst Big Data Sci & Ind, Taiyuan 030006, Shanxi, Peoples R China
[3] Shanxi Univ, Inst Big Data Sci & Ind, Sch Comp & Informat Technol, Key Lab Comp Intelligence & China Informat Proc,Mi, Taiyuan 030006, Shanxi, Peoples R China
[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[5] City Univ Hong Kong, Shenzhen Res Inst, Shenzhen 518057, Peoples R China
基金
中国国家自然科学基金;
关键词
Entropy; Spirals; Noise measurement; Mutual information; Knowledge engineering; Data mining; Data engineering; Association mining; multivariate association measure; distribution-free; nonparametric; neighborhood information; ATTRIBUTE REDUCTION;
D O I
10.1109/TKDE.2022.3178090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most current data is multivariable, exploring and identifying valuable information in these datasets has far-reaching impacts. In particular, discovering meaningful hidden association patterns in multivariate plays an important role. Plenty of measures for multivariate association have been proposed, yet it is still an open research challenge for effectively capturing association patterns among three or more variables, especially the scenario without any prior knowledge about those relationships. To do so, we desire a distribution-free, association type-independent and non-parametrical measure. For practical applications, such a measure should comparable, interpretable, scalable, intuitive, reliability, and robust. However, no exiting measures fulfill all of these desiderata. In this paper, taking advantage of the neighborhood information of a sample, we propose MNA, a maximal neighborhood multivariate association measure that satisfies all the above criteria. Extensive experiments on synthetic and real data show it outperforms state-of-the-art multivariate association measures.
引用
收藏
页码:6126 / 6135
页数:10
相关论文
共 40 条
  • [1] Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
  • [2] Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks
    Agrawal, Saurabh
    Steinbach, Michael
    Boley, Daniel
    Chatterjee, Snigdhansu
    Atluri, Gowtham
    Dang, Anh The
    Liess, Stefan
    Kumar, Vipin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1798 - 1811
  • [3] Bargiela A., 2016, HANDBOOK ON COMPUTATIONAL INTELLIGENCE: Volume, V1, P43
  • [4] Data Science: Big Data, Machine Learning, and Artificial Intelligence
    Carlos, Ruth C.
    Kahn, Charles E.
    Halabi, Safwan
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (03) : 497 - 498
  • [5] [成红红 Cheng Honghong], 2020, [中国科学. 信息科学, Scientia Sinica Informationis], V50, P824
  • [6] Das K, 2004, ANN STAT, V32, P818
  • [7] Granular information retrieval using neighborhood systems
    El Barbary, O. G.
    Salama, A. S.
    Atlam, El Sayed
    [J]. MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2018, 41 (15) : 5737 - 5753
  • [8] HAN TS, 1980, INFORM CONTROL, V46, P26, DOI 10.1016/S0019-9958(80)90478-7
  • [9] Nguyen HV, 2014, PR MACH LEARN RES, V32, P775
  • [10] Measuring relevance between discrete and continuous features based on neighborhood mutual information
    Hu, Qinghua
    Zhang, Lei
    Zhang, David
    Pan, Wei
    An, Shuang
    Pedrycz, Witold
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) : 10737 - 10750