A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems

被引:0
作者
Fujiwara T. [1 ]
Li J.K. [1 ]
Mubarak M. [2 ]
Ross C. [3 ]
Carothers C.D. [3 ]
Ross R.B. [2 ]
Ma K.-L. [1 ]
机构
[1] University of California, Davis
基金
美国国家科学基金会;
关键词
Dragonfly networks; Parallel communication network; Performance analysis; Supercomputing; Time-series data; Visual analytics;
D O I
10.1016/j.visinf.2018.04.010
中图分类号
学科分类号
摘要
The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects. Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology. It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices, such as job scheduling and routing strategies. However, in order to study these temporal network behavior, we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly's multi-level hierarchies. This paper presents such a tool–a visual analytics system–that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer. We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations. Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies, which effectively helps visual analysis tasks. We demonstrate the effectiveness of the system with a set of case studies. Our system and findings can not only help improve the communication performance of supercomputing applications, but also the network performance of next-generation supercomputers. © 2018 Zhejiang University and Zhejiang University Press
引用
收藏
页码:98 / 110
页数:12
相关论文
共 62 条
  • [1] Adhianto L., Banerjee S., Fagan M., Krentel M., Marin G., Mellor-Crummey J., Tallent N.R., HPCToolkit: Tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, 22, pp. 685-701, (2010)
  • [2] Adiga N.R., Blumrich M.A., Chen D., Coteus P., Gara A., Giampapa M.E., Heidelberger P., Singh S., Steinmacher-Burow B.D., Takken T., Tsao M., Vranas P., Blue Gene/L torus interconnection network, IBM Journal of Research and Development, 49, pp. 265-276, (2005)
  • [3] Aigner W., Miksch S., Schumann H., Tominski C., Visualization of time-oriented data, (2011)
  • [4] Aminikhanghahi S., Cook D.J., A survey of methods for time series change point detection, Knowledge and information systems, 51, pp. 339-367, (2017)
  • [5] Bach B., Shi C., Heulot N., Madhyastha T., Grabowski T., Dragicevic P., Time curves: Folding time to visualize patterns of temporal evolution in data, IEEE Transactions on Visualization and Computer Graphics, 22, pp. 559-568, (2016)
  • [6] Barnes Jr. P.D., Carothers C.D., Jefferson D.R., LaPre J.M., Warp speed: executing time warp on 1,966,080 cores, Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp. 327-336, (2013)
  • [7] Bell J., Almgren A., Beckner V., Day M., Lijewski M., Nonaka A., Zhang W., (2012)
  • [8] Berndt D.J., Clifford J., Using dynamic time warping to find patterns in time series, KDD workshop, pp. 359-370, (1994)
  • [9] Bhatele A., Jain N., Livnat Y., Pascucci V., Bremer P.T., Analyzing network health and congestion in dragonfly-based supercomputers, IEEE Parallel and Distributed Processing Symposium, pp. 93-102, (2016)
  • [10] Bryan C., Ma K.L., Woodring J., Temporal summary images: An approach to narrative visualization via interactive annotation generation and placement, IEEE Transactions on Visualization and Computer Graphics, 23, pp. 511-520, (2017)