Kernel canonical correlation analysis for data combination of multiple-source datasets

被引:1
作者
Mitsuhiro, Masaki [1 ,2 ]
Hoshino, Takahiro [2 ,3 ]
机构
[1] Nikkei Res Inc, Chiyoda Ku, 2-2-1 Uchikanda, Tokyo 1010047, Japan
[2] Keio Univ, Minato Ku, 2-15-45 Mita, Tokyo 1088345, Japan
[3] RIKEN, Ctr Adv Intelligence Project, Chuo Ku, Nihonbashi 1 Chome Mitsui Bldg,15th Floor, Tokyo 1030027, Japan
关键词
Missing data; Multivariate data analysis; Kernelization; Statistical matching; Statistical data fusion; BIAS; SETS;
D O I
10.1007/s42081-020-00074-z
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
To investigate the relationship between variables that are not observed simultaneously in the same dataset, "multiple-source datasets" obtained from different individuals or units must be integrated into a "(quasi) single-source dataset", in which all the relevant variables are observed for the same units. Among various data combination methods, the statistical matching method, frequently used in practical usage in marketing or social sciences, matches units from a certain dataset with similar units from another dataset in terms of the distance of each unit's values of covariates related to the concerned variables. However, when multiple-source datasets have a large number of covariates, it is difficult to obtain accurate quasi single-source dataset using matching methods, because combination of the covariates' values becomes complicated and/or it is difficult to deal with the nonlinear relationship between the concerned variables. In this study, we propose a data combination method that combines extension of kernel canonical correlation analysis and statistical matching. This proposed method can estimate canonical variables of a common low-dimensional space that can preserve the relationship between covariates and outcome variables. Using a simulation study and real-world data analysis, we compare our method with existing methods and demonstrate its utility.
引用
收藏
页码:651 / 668
页数:18
相关论文
共 28 条
[1]  
Adachi K., 2016, Matrix-based introduction to multivariate data analysis, DOI [10.1007/978-981-10-2341-5, DOI 10.1007/978-981-10-2341-5]
[2]  
Akaho S., 2001, P INT M PSYCH SOC IM, P263
[3]   Influence function and robust variant of kernel canonical correlation analysis [J].
Alam, Md Ashad ;
Fukumizu, Kenji ;
Wang, Yu-Ping .
NEUROCOMPUTING, 2018, 304 :12-29
[4]   Higher-Order Regularized Kernel Canonical Correlation Analysis [J].
Alam, Md. Ashad ;
Fukumizu, Kenji .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (04)
[5]   Kernel independent component analysis [J].
Bach, FR ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :1-48
[6]   An estimate of the covariance between variables which are not jointly observed [J].
Cudeck, R .
PSYCHOMETRIKA, 2000, 65 (04) :539-546
[7]  
D Orazio M., 2004, STAT MATCHING LIKELI
[8]   A direct approach to data fusion [J].
Gilula, Z ;
McCulloch, RE ;
Rossi, PE .
JOURNAL OF MARKETING RESEARCH, 2006, 43 (01) :73-83
[9]   Convergence analysis of kernel Canonical Correlation Analysis: theory and practice [J].
Hardoon, David R. ;
Shawe-Taylor, John .
MACHINE LEARNING, 2009, 74 (01) :23-38
[10]   Matching as an econometric evaluation estimator [J].
Heckman, JJ ;
Ichimura, H ;
Todd, P .
REVIEW OF ECONOMIC STUDIES, 1998, 65 (02) :261-294