A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction

被引:136
作者
Kuang, Liwei [1 ]
Hao, Fei [1 ]
Yang, Laurence T. [1 ,2 ]
Lin, Man [2 ]
Luo, Changqing [1 ]
Min, Geyong [3 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
[3] Univ Exeter, Coll Engn Math & Phys Sci, Exeter EX4 4QF, Devon, England
基金
中国国家自然科学基金;
关键词
Tensor; HOSVD; dimensionality reduction; data representation;
D O I
10.1109/TETC.2014.2330516
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semistructured, and structured data. With tensor extension operator, various types of data are represented as subtensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an incremental high order singular value decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyzes in terms of time complexity, memory usage, and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. Theoretical analyzes and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.
引用
收藏
页码:280 / 291
页数:12
相关论文
共 22 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]  
[Anonymous], 2000, MATRIX ANAL APPL LIN
[3]  
[Anonymous], 2010, Elementary Linear Algebra
[4]  
[Anonymous], 2003, Journal of Web Semantics
[5]  
[Anonymous], 2006, KDD
[6]  
[Anonymous], 2004, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia
[7]  
Brand M, 2002, LECT NOTES COMPUT SC, V2350, P707
[8]  
Cruz IF, 2009, STUD COMPUT INTELL, V168, P75
[9]   A multilinear singular value decomposition [J].
De Lathauwer, L ;
De Moor, B ;
Vandewalle, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) :1253-1278
[10]   On the best rank-1 and rank-(R1,R2,...,RN) approximation of higher-order tensors [J].
De Lathauwer, L ;
De Moor, B ;
Vandewalle, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) :1324-1342