Improving the Accuracy of Principal Component Analysis by the Maximum Entropy Method

被引：0

作者：

Wan, Guihong ^{[1
]}

Maung, Crystal ^{[2
]}

Schweitzer, Haim ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA

[2] 7 Eleven Inc, 7 Next, Irving, TX 75063 USA

来源：

2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019) | 2019年

关键词：

Principal Component Analysis (PCA); Dimension Reduction; Low Rank Matrix Representation; Maximum Entropy Method; Euclidean distance; Rayleigh Quotient;

D O I：

10.1109/ICTAI.2019.00116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The most important function is arguably the Euclidean distance between data items. This can be used, for example, to solve the approximate nearest neighbor problem. We use random variables to model the inherent uncertainty in such approximations, and apply the Maximum Entropy Method to infer the underlying probability distribution. We propose using the expected values of distances between these random variables as improved estimates of the distance. We show experimentally that in most cases results obtained by our method are more accurate than what is obtained by the classical approach. This improves the accuracy of a classical technique that have been used with little change for over 100 years.

引用

页码：808 / 815

页数：8

共 14 条

[1] A haplotype map of the human genome
Altshuler, D
Brooks, LD
Chakravarti, A
Collins, FS
Daly, MJ
Donnelly, P
Gibbs, RA
Belmont, JW
Boudreau, A
Leal, SM
Hardenbol, P
Pasternak, S
Wheeler, DA
Willis, TD
Yu, FL
Yang, HM
Zeng, CQ
Gao, Y
Hu, HR
Hu, WT
Li, CH
Lin, W
Liu, SQ
Pan, H
Tang, XL
Wang, J
Wang, W
Yu, J
Zhang, B
Zhang, QR
Zhao, HB
Zhao, H
Zhou, J
Gabriel, SB
Barry, R
Blumenstiel, B
Camargo, A
Defelice, M
Faggart, M
Goyette, M
Gupta, S
Moore, J
Nguyen, H
Onofrio, RC
Parkin, M
Roy, J
Stahl, E
Winchester, E
Ziaugra, L
Shen, Y
[J]. NATURE, 2005, 437 (7063) : 1299 - 1320
[2] [Anonymous], 1994, Multidimensional scaling
[3] [Anonymous], 2010, Dimension Reduction: A Guided Tour. Foundations and trends in machine learning
[4] Cadima J, 2009, PAK J STAT, V25, P473
[5] Golub G., 2013, MATRIX COMPUTATIONS, DOI DOI 10.56021/9781421407944
[6] Gray V., 2017, MATH RES DEV
[7] He B., 2019, P 33 NAT C ART INT A
[8] ON THE RATIONALE OF MAXIMUM-ENTROPY METHODS
JAYNES, ET
[J]. PROCEEDINGS OF THE IEEE, 1982, 70 (09) : 939 - 952
[9] Jolliffe I. T., 2002, PRINCIPAL COMPONENT
[10] Papoulis A., 2002, Probability, random variables, and stochastic processes, V4th

← 1 2 →