Real Time Principal Component Analysis

被引:0
作者
Chowdhury, Ranak Roy [1 ]
Adnan, Muhammad Abdullah [1 ]
Gupta, Rajesh K. [2 ]
机构
[1] BUET, Dhaka, Bangladesh
[2] Univ Calif San Diego, San Diego, CA 92103 USA
来源
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019) | 2019年
关键词
Big Data; Real Time; Dimensionality Reduction; PCA;
D O I
10.1109/ICDE.2019.00171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
By processing the data in motion, real-time data processing enables us to extract instantaneous results from online input data that ensures timely responsiveness to events as well as a much enhanced capacity to process large data sets. This is especially important when decision loops include querying and processing data on the web where size and latency considerations make it impossible to process raw data in real-time. This makes dimensionality reduction techniques, like principal component analysis (PCA), an important data preprocessing tool to gain insights into data. In this paper, we propose a variant of PCA, that is suited for real-time applications. In the real-time version of the PCA problem, we maintain a window over the most recent data and project every incoming row of data into lower dimensional subspace, which we generate as the output of the model. The goal is to minimize the reconstruction error of the output from the input. We use the reconstruction error as the termination criteria to update the eigenspace as new data arrives. To verify whether our proposed model can capture the essence of the changing distribution of large datasets in real-time, we have implemented the algorithm and evaluated performance against carefully designed simulations that change distributions of data sources over time in a controllable manner. Furthermore, we have demonstrated that our algorithm can capture the changing distributions of real-life datasets by running simulations on datasets from a variety of real-time applications e.g. localization, customer expenditure, etc. We propose algorithmic enhancements that rely upon spectral analysis to improve dimensionality reduction. Results show that our method can successfully capture the changing distribution of data in a real-time scenario, thus enabling real-time PCA.
引用
收藏
页码:1678 / 1681
页数:4
相关论文
共 18 条
[1]  
Abreu A. R., 2011, THESIS
[2]  
[Anonymous], 2002, TECH REP
[3]  
[Anonymous], 2012, TRENDMINER ARCHITECT
[4]   FRCT: fuzzy-rough classification trees [J].
Bhatt, Rajen B. ;
Gopal, M. .
PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (01) :73-88
[5]  
Bhattacharyya A., 1943, Bull. Calcutta Math. Soc., V35, P99, DOI DOI 10.1038/157869B0
[6]  
Boutsidis C., 2015, P 26 ANN ACM SIAM S, P887
[7]  
Dunteman G.H., 1989, Principal component analysis. Quantitative applications in the social sciences series vol, V69
[8]  
Jolliffe I., 2011, PRINCIPAL COMPONENT, P2, DOI DOI 10.1007/978-3-642-04898-2_455
[9]  
Karnin Z., 2015, P 28 C LEARN THEOR, P1129
[10]   A data mining framework for building intrusion detection models [J].
Lee, W ;
Stolfo, SJ ;
Mok, KW .
PROCEEDINGS OF THE 1999 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 1999, :120-132