Distributed Incremental Tensor Tucker Decomposition

被引:0
作者
Yang K.-Y. [1 ]
Gao Y.-J. [1 ,2 ]
Chen L. [1 ]
Ge C.-C. [1 ]
Shen Y.-F. [1 ]
机构
[1] College of Computer Science, Zhejiang University, Hangzhou
[2] Alibaba-Zhejiang University Joint Institute of Frontier Technologies, Hangzhou
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2021年 / 44卷 / 08期
基金
中国国家自然科学基金;
关键词
Distributed; Incremental; Spark; Tensor; Tucker decomposition;
D O I
10.11897/SP.J.1016.2021.01696
中图分类号
学科分类号
摘要
With the rapid development of social networks, e-commerce systems, and mobile terminal devices, massive and high-dimensional data is overwhelmingly increasing. A natural representation of high dimensional data is called tensor. Tensor Tucker decomposition is a fundamental machine learning method for multi-dimensional data analysis. Tensor Tucker decomposition aims at discovering the latent representations for the given tensor. It is widely used in many applications, such as recommendation systems, image compression, computer vision, to name but a few. Nevertheless, most traditional tensor decomposition methods could only handle static data, and are not suitable for dynamic data. Because those traditional methods could only re-compute the whole tensor decomposition from scratch whenever data grows. Besides, several incremental tensor Tucker decomposition methods focus on one-mode incremental tensor. However, the tensor in real life could be developed in multiple modes. To our knowledge, there is only one method suitable for multi-mode incremental tensor Tucker decomposition. Nevertheless, all the existing incremental tensor Tucker decomposition methods are designed for the standalone machine, and thus, they cannot handle large-scale dynamic incremental data efficiently. Considering the continuous expansion nature of data, it requires an efficient distributed incremental tensor Tucker decomposition method. Thus, this paper proposes a Distributed Incremental Tensor Tucker Decomposition method (DITTD for short), which is the first attempt to tackle this problem. Towards this, there are two main challenges. The first one is to achieve the load balancing among the workers in the distributed environment. DITTD firstly divides the incremental tensor based on its positional relationship with the previous one. Then, DITTD tries to generate the optimal partitioning result for the incremental tensor, such that the number of non-zero elements in each tensor partition is equal. However, the optimal tensor partitioning problem is proved to be NP-hard. Thus, DITTD utilizes two heuristic tensor partitioning methods to partition the incremental tensor as well as possible. One is the Greedy tensor Partitioning algorithm (GP for short), which greedily assigns partitioning boundaries and makes the number of non-zero elements in each tensor partition to be close to the optimal sum target. The other is the Max-min Matching tensor Partitioning algorithm (M2P), which iteratively assigns the tensor slice with the maximum number of non-zero elements into the tensor partition with the minimum number of non-zero elements. After the tensor partitioning, DITTD meets the second challenge, which is to compute the incremental tensor Tucker decomposition efficiently. DITTD provides a novel incremental Tucker decomposition computation method to avoid the explosion of intermediate data of Tucker decomposition. This method provides an equivalent conversion for the update rule of factor matrices and designs a row-wise computation strategy for the distributed Tucker decomposition. Based on the above-mentioned techniques, DITTD can reduce the computation of intermediate data and the network communication among the workers, and thus, DITTD improves the efficiency of the distributed Tucker decomposition for incremental tensor. Last but not the least, comprehensive experiments are conducted on both real and synthetic data sets. The experimental results show that our proposed method DITTD is at least 10× faster than the baseline method and scale well. © 2021, Science Press. All right reserved.
引用
收藏
页码:1696 / 1713
页数:17
相关论文
共 46 条
[1]  
Kolda T G, Bader B W., Tensor decompositions and applications, SIAM Review, 51, 3, pp. 455-500, (2009)
[2]  
Papalexakis E E, Faloutsos C, Sidiropoulos N D., Tensors for data mining and data fusion: Models, applications, and scalable algorithms, ACM Transactions on Intelligent Systems and Technology, 8, 2, pp. 1-44, (2016)
[3]  
Bahadori M T, Yu Q R, Liu Y., Fast multivariate spatio-temporal analysis via low rank tensor learning, Proceedings of the Advances in Neural Information Processing Systems, pp. 3491-3499, (2014)
[4]  
Zhang Zhi-Qiang, Zhou Yong, Xie Xiao-Qin, Et al., Research on advertising click-through rate estimation based on feature learning, Chinese Journal of Computers, 39, 4, pp. 780-794, (2016)
[5]  
Wang H., Facial expression decomposition, Proceedings of the IEEE International Conference on Computer Vision, pp. 958-965, (2003)
[6]  
Xia Zhi-Ming, Xu Zong-Ben, Information compression based on principal component analysis: From one-order to higher-order, Scientia Sinica Informationis, 48, pp. 1622-1633, (2018)
[7]  
Zheng Jian-Wei, Wang Wan-Liang, Yao Xiao-Min, Et al., Face recognition using tensor local Fisher discriminant analysis, Acta Automatica Sinica, 38, 9, pp. 1485-1495, (2012)
[8]  
Signoretto M, Dinh Q T, De Lathauwer L, Et al., Learning with tensors: A framework based on convex optimization and spectral regularization, Machine Learning, 94, 3, pp. 303-351, (2014)
[9]  
Kang Liang-Yi, Wang Jian-Fei, Liu Jie, Et al., Survey on parallel and distributed optimization algorithms for scalable machine learning, Journal of Software, 29, 1, pp. 109-130, (2018)
[10]  
Li Guo-Liang, Zhou Xuan-He, Survey of data management techniques for supporting artificial intelligence, Journal of Software, 32, 1, pp. 21-40, (2021)