Neighborhood Information-Based Method for Multivariate Association Mining

被引：2

作者：

Cheng, Honghong ^{[1
,2
]}

Qian, Yuhua ^{[3
]}

Guo, Yingjie ^{[3
]}

Zheng, Keyin ^{[3
]}

Zhang, Qingfu ^{[4
,5
]}

机构：

[1] Shanxi Univ Finance & Econ, Sch Informat, Taiyuan 030012, Shanxi, Peoples R China

[2] Shanxi Univ, Inst Big Data Sci & Ind, Taiyuan 030006, Shanxi, Peoples R China

[3] Shanxi Univ, Inst Big Data Sci & Ind, Sch Comp & Informat Technol, Key Lab Comp Intelligence & China Informat Proc,Mi, Taiyuan 030006, Shanxi, Peoples R China

[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[5] City Univ Hong Kong, Shenzhen Res Inst, Shenzhen 518057, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Entropy; Spirals; Noise measurement; Mutual information; Knowledge engineering; Data mining; Data engineering; Association mining; multivariate association measure; distribution-free; nonparametric; neighborhood information; ATTRIBUTE REDUCTION;

D O I：

10.1109/TKDE.2022.3178090

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most current data is multivariable, exploring and identifying valuable information in these datasets has far-reaching impacts. In particular, discovering meaningful hidden association patterns in multivariate plays an important role. Plenty of measures for multivariate association have been proposed, yet it is still an open research challenge for effectively capturing association patterns among three or more variables, especially the scenario without any prior knowledge about those relationships. To do so, we desire a distribution-free, association type-independent and non-parametrical measure. For practical applications, such a measure should comparable, interpretable, scalable, intuitive, reliability, and robust. However, no exiting measures fulfill all of these desiderata. In this paper, taking advantage of the neighborhood information of a sample, we propose MNA, a maximal neighborhood multivariate association measure that satisfies all the above criteria. Extensive experiments on synthetic and real data show it outperforms state-of-the-art multivariate association measures.

引用

页码：6126 / 6135

页数：10

共 40 条

[1] Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
[2] Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks
Agrawal, Saurabh
Steinbach, Michael
Boley, Daniel
Chatterjee, Snigdhansu
Atluri, Gowtham
Dang, Anh The
Liess, Stefan
Kumar, Vipin
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1798 - 1811
[3] Bargiela A., 2016, HANDBOOK ON COMPUTATIONAL INTELLIGENCE: Volume, V1, P43
[4] Data Science: Big Data, Machine Learning, and Artificial Intelligence
Carlos, Ruth C.
Kahn, Charles E.
Halabi, Safwan
[J]. JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (03) : 497 - 498
[5] [成红红 Cheng Honghong], 2020, [中国科学. 信息科学, Scientia Sinica Informationis], V50, P824
[6] Das K, 2004, ANN STAT, V32, P818
[7] Granular information retrieval using neighborhood systems
El Barbary, O. G.
Salama, A. S.
Atlam, El Sayed
[J]. MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2018, 41 (15) : 5737 - 5753
[8] HAN TS, 1980, INFORM CONTROL, V46, P26, DOI 10.1016/S0019-9958(80)90478-7
[9] Nguyen HV, 2014, PR MACH LEARN RES, V32, P775
[10] Measuring relevance between discrete and continuous features based on neighborhood mutual information
Hu, Qinghua
Zhang, Lei
Zhang, David
Pan, Wei
An, Shuang
Pedrycz, Witold
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) : 10737 - 10750

← 1 2 3 4 →