Exploring attribute correspondences across heterogeneous databases by mutual information

被引:4
作者
Zhao, HM [1 ]
Soofi, ES [1 ]
机构
[1] Univ Wisconsin, Sch Business Adm, Milwaukee, WI 53201 USA
关键词
attribute correspondence; attribute matching; composite information systems; database interoperability; heterogeneous databases; information theory; interorganizational systems; mutual information;
D O I
10.2753/MIS0742-1222220411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identifying attribute correspondences across heterogeneous databases is a critical and time-consuming step in integrating the databases. Past research has applied correlation analysis techniques to explore correspondences between attributes. These techniques, however, are appropriate for numeric attributes that are linearly related. This paper proposes an information-theoretic approach to exploring correspondences between attributes in heterogeneous databases. The proposed approach is applicable to character attributes, as well as to numeric attributes, regardless whether or not they are linearly related. It overcomes some serious shortcomings of previous approaches based on correlation analysis and has much broader applicability. The proposed procedure samples both matching and nonmatching pairs of records from the databases under consideration, applies matching functions to compare pairs of attributes, and then uses the mutual information to measure the dependency between a matching function as applied to a pair of attributes and the class (i.e., matching or nonmatching) of a pair of records. A high mutual information index implies a potential attribute correspondence, which is presented to the analyst for further evaluation. The paper also presents some empirical results demonstrating the utility of the proposed approach.
引用
收藏
页码:305 / 336
页数:32
相关论文
共 36 条
  • [1] Matching Attributes Across Overlapping Heterogeneous Data Sources Using Mutual Information
    Zhao, Huimin
    JOURNAL OF DATABASE MANAGEMENT, 2010, 21 (04) : 91 - 110
  • [2] SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks
    Li, WS
    Clifton, C
    DATA & KNOWLEDGE ENGINEERING, 2000, 33 (01) : 49 - 84
  • [3] Identifying Corresponding Entities Based on Attribute Entropy in Heterogeneous Databases
    Qiang, Bao-hua
    Xi, Jian-qing
    Qiang, Bao-hua
    Wu, Chun-ming
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10980 - +
  • [4] Research on Entities Matching across Heterogeneous Databases
    Qiang, Bao-hua
    Zhang, Long
    Xi, Jian-qing
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10988 - +
  • [5] Cost Minimization Attribute Reduction Based on Mutual Information
    Xu, Feifei
    Bi, Zhongqin
    Lei, Jingsheng
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 215 - 219
  • [6] Integrating Strategies for Keyword Querying across Heterogeneous Databases
    Zhu, Qing
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 505 - 509
  • [7] A Mean Mutual Information Based Approach for Selecting Clustering Attribute
    Qin, Hongwu
    Ma, Xiuqin
    Zain, Jasni Mohamad
    Sulaiman, Norrozila
    Herawan, Tutut
    SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 1 - 15
  • [8] Mutual information-based algorithm for fuzzy-rough attribute reduction
    Xu, Fei-Fei
    Miao, Duo-Qian
    Wei, Lai
    Feng, Qin-Rong
    Bi, Yu-Sheng
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2008, 30 (06): : 1372 - 1375
  • [9] A Rough Set Algorithm for Attribute Reduction via Mutual Information and Conditional Entropy
    Tian, Jing
    Wang, Quan
    Yu, Bing
    Yu, Dan
    2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 567 - 571
  • [10] Mutual Information-Based Supervised Attribute Clustering for Microarray Sample Classification
    Maji, Pradipta
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 127 - 140