A Framework for Fair Comparison of Network Representation Learning Algorithm Performance Based on Hyperparameter Tuning

被引:0
|
作者
Guo M.-Y. [1 ,2 ]
Sun Z.-Y. [1 ,2 ]
Zhu Y.-Q. [3 ,4 ]
Bao Y.-G. [1 ,2 ]
机构
[1] Center for Advanced Computer Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
[3] Beijing National Research Center of Information Science and Technology (Tsinghua University), Beijing
[4] National Engineering Laboratory of Big Data System Software, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 05期
基金
中国国家自然科学基金;
关键词
Automated machine learning; Graph convolutional network; Hyperparameter tuning; Network embedding; Network representation learning;
D O I
10.11897/SP.J.1016.2022.00897
中图分类号
学科分类号
摘要
Network data are ubiquitous in real-world applications to represent complex relationships of objects, e.g., social networks, reference networks, and web networks, etc. However, due to the large-scale and high-dimensional sparse representation of network datasets, it is hard to directly apply off-the-shelve machine learning methods for analysis. Network representation learning(NRL) can generate succinct node representations for large-scale networks, and serve as a bridge between machine learning methods and network data. It has attracted great research interests from both academia and industry. Despite the wide adoption of NRL algorithms, the setting of their hyperparameters remains an impacting factor to the success of their applications, as hyperparameters can influence the algorithms' performance results to a great extent. How to generate a task-aware set of hyperparameters for different NRL algorithms in order to obtain their best performance, achieve their performance fair comparison, and select the most suitable NRL algorithm to analyze the network data are fundamental questions to be answered before the application of NRL algorithms. In addition, hyperparameters tuning is a time-consuming task, and the massive scale of network datasets has further complicated the problem by incurring a high memory footprint. So, how to tune NRL algorithms' hyperparameters within given resource constraints such as the time constraint or the memory limit is also a problem. Regarding the above two problems, in this work, we propose an easy-to-use framework named JITNREv, to compare NRL algorithms fairly within resource constraints based on hyperparameters tuning. The framework has four loosely coupled components and adopts a sample-test-optimize process in a closed loop. The four main components are named hyperparameter sampler, NRL algorithm manipulator, performance evaluator, and hyperparameter sampling space optimizer. All components interact with each other through data flow. We use the divide-and-diverge sampling method based on Latin Hypercube Sampling to sample a set of hyperparameters, and trim the sample space around the previous best configuration according to the assumption that "around the point with the best performance in the sample set we will be more likely to find other points with similar or better performance". Massive scale of network data also brings great challenges to hyperparameter tuning, since the computational cost of NRL algorithms increases in proportion to the network scale. So we use the graph coarsening model to reduce data size and preserve graph structural information. Therefore, JITNREv can easily meet the resource constraints set by users. Besides, the framework also integrates representative algorithms, general evaluation datasets, commonly used evaluation metrics, and data analysis applications for easy use of the framework. Extensive experiments demonstrate that JITNREv can stably improve the performance of general NRL algorithms only by hyperparameter tuning, thus enabling the fair comparisons of NRL algorithms at their best performances. As an example, for the node classification task of the GCN algorithm, JITNREv can increase the accuracy by up to 31% compared with the default hyperparameter settings. © 2022, Science Press. All right reserved.
引用
收藏
页码:897 / 917
页数:20
相关论文
共 62 条
  • [1] Tang Lei, Liu Huan, Relational learning via latent social dimensions, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817-826, (2009)
  • [2] Mccallum A K, Nigam K, Rennie J, Et al., Automating the construction of internet portals with machine learning, Information Retrieval, 3, 2, pp. 127-163, (2000)
  • [3] Sen P, Namata G, Bilgic M, Et al., Collective classification in network data, AI Magazine, 29, 3, pp. 93-106, (2008)
  • [4] Leskovec J, Kleinberg J, Faloutsos C., Graph evolution: Densification and shrinking diameters, ACM Transactions on Knowledge Discovery from Data(TKDD), 1, 1, (2007)
  • [5] Klymko C, Gleich D, Kolda T G., Using triangles to improve community detection in directed networks, (2014)
  • [6] Tand Jian, Qu Meng, Wang Mingzhe, Et al., LINE: Large-scale information network embedding, Proceedings of the 24th International Conference on World Wide Web, pp. 1067-1077, (2015)
  • [7] Kipf T N, Welling M., Semi-supervised classification with graph convolutional networks, Proceedings of the 5th International Conference on Learning Representations, (2017)
  • [8] Perozzi B, Al-Rfou R, Skiena S., Deepwalk: Online learning of social representations, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701-710, (2014)
  • [9] Wang Daixin, Cui Peng, Zhu Wenwu, Structural deep network embedding, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225-1234, (2016)
  • [10] Cruz A F, Saleiro P, Belem C, Et al., Promoting Fairness through Hyperparameter Optimization, (2021)