Temporal Autoregressive Matrix Factorization for High-Dimensional Time Series Prediction of OSS

被引:4
作者
Chen, Liang [1 ]
Yang, Yun [2 ]
Wang, Wei [1 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai 200093, Peoples R China
[2] Changshu Inst Technol, Business Sch, Suzhou 215500, Peoples R China
关键词
Autoregressive matrix factorization (MF); high-dimensional; open-source software (OSS); time series forecasting; NEURAL-NETWORK; MODEL;
D O I
10.1109/TNNLS.2023.3271327
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-source software (OSS) plays an increasingly significant role in modern software development tendency, so accurate prediction of the future development of OSS has become an essential topic. The behavioral data of different open-source software are closely related to their development prospects. However, most of these behavioral data are typical high-dimensional time series data streams with noise and missing values. Hence, accurate prediction on such cluttered data requires the model to be highly scalable, which is not a property of traditional time series prediction models. To this end, we propose a temporal autoregressive matrix factorization (TAMF) framework that supports data-driven temporal learning and prediction. Specifically, we first construct a trend and period autoregressive model to extract trend and period features from OSS behavioral data, and then combine the regression model with a graph-based matrix factorization (MF) to complete the missing values by exploiting the correlations among the time series data. Finally, use the trained regression model to make predictions on the target data. This scheme ensures that TAMF can be applied to different types of high-dimensional time series data and thus has high versatility. We selected ten real developer behavior data from GitHub for case analysis. The experimental results show that TAMF has good scalability and prediction accuracy.
引用
收藏
页码:13741 / 13752
页数:12
相关论文
共 37 条
[1]  
Anava O, 2015, PR MACH LEARN RES, V37, P2191
[2]  
Bai Yu, 2019, Cereal & Food Industry, P1
[3]   When and How to Make Breaking Changes: Policies and Practices in 18 Open Source Software Ecosystems [J].
Bogart, Chris ;
Kastner, Christian ;
Herbsleb, James ;
Thung, Ferdian .
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (04)
[4]   COVID-19 Pandemic: ARIMA and Regression Model-Based Worldwide Death Cases Predictions [J].
Chaurasia V. ;
Pal S. .
SN Computer Science, 2020, 1 (5)
[5]   Recurrent Neural Networks for Multivariate Time Series with Missing Values [J].
Che, Zhengping ;
Purushotham, Sanjay ;
Cho, Kyunghyun ;
Sontag, David ;
Liu, Yan .
SCIENTIFIC REPORTS, 2018, 8
[6]   Entropy Minimizing Matrix Factorization [J].
Chen, Mulin ;
Li, Xuelong .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) :9209-9222
[7]   A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation [J].
Chen, Xinyu ;
Yang, Jinming ;
Sun, Lijun .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 117
[8]   A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation [J].
Chen, Xinyu ;
He, Zhaocheng ;
Sun, Lijun .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 98 :73-84
[9]   Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview [J].
Chi, Yuejie ;
Lu, Yue M. ;
Chen, Yuxin .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (20) :5239-5269
[10]   A Systematic Mapping Study of Software Development With GitHub [J].
Cosentino, Valerio ;
Canovas Izquierdo, Javier L. ;
Cabot, Jordi .
IEEE ACCESS, 2017, 5 :7173-7192