HLS-Based Large Scale Self-Organizing Feature Maps

被引:0
作者
Porrmann, Florian [1 ]
Hagemeyer, Jens [1 ]
Porrmann, Mario [2 ]
机构
[1] Bielefeld Univ, Cognitron & Sensor Syst Grp, CITEC, D-33615 Bielefeld, Germany
[2] Osnabruck Univ, Inst Comp Sci, Comp Engn Grp, D-49090 Osnabruck, Germany
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Neurons; Computer architecture; Clustering algorithms; Training; Graphics processing units; Statistical analysis; Space exploration; Heterogeneous networks; Field programmable gate array; hardware acceleration; machine learning; reconfigurable architectures; reconfigurable computing; heterogeneous computing; heterogeneous architectures; self-organizing feature maps; optimization; design space exploration;
D O I
10.1109/ACCESS.2024.3471471
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Self-Organizing Map (SOM) algorithm is a clustering algorithm used in a wide variety of application domains. Over the last few decades, it has been accelerated using various hardware architectures, including FPGAs, CPUs, and GPUs. This publication presents an High-Level Synthesis-based implementation that utilizes multiple processing elements to realize a high-performance system architecture. An extensive design space exploration was conducted to evaluate the performance range of the architecture. For this, vector dimensions ranging from 8 up to 512 and map sizes from 16x 16 to 512x512 were used. The evaluation was performed using two different AMD/Xilinx UltraScale+ FPGA systems, the VCU128 PCIe-based accelerator card and the ZCU106 stand-alone evaluation kit. From the achieved results, it can be seen that the performance scales nearly linearly for a given vector dimension when the map size is increased. In addition, the energy efficiency for both FPGAs was analyzed, revealing that while the ZCU106 is less powerful in terms of raw compute power, it requires up to 4x less power and, depending on the configuration, can be 2x more energy efficient compared to the VCU128. One of the main reasons for this is that it does not require a dedicated host system but utilizes its internal ARM cores. Finally, a comparison against state-of-the-art SOM implementations was conducted. The proposed design achieves a speed-up of up to 458.7, 1,630.4 , and 4.9 compared to other CPU, GPU, and FPGA realizations, respectively.
引用
收藏
页码:142459 / 142474
页数:16
相关论文
共 46 条
[1]  
A. M. Devices, 2021, UltraScale Architecture Memory Resources User Guide (UG573)
[2]  
Advanced Micro Devices, 2024, AMD Vitis Model Composer
[3]   PyTorch 2: Faster Machine Learning Through Dynamic Python']Python Bytecode Transformation and Graph Compilation [J].
Ansel, Jason ;
Yang, Edward ;
He, Horace ;
Gimelshein, Natalia ;
Jain, Animesh ;
Voznesensky, Michael ;
Bao, Bin ;
Bell, Peter ;
Berard, David ;
Burovski, Evgeni ;
Chauhan, Geeta ;
Chourdia, Anjali ;
Constable, Will ;
Desmaison, Alban ;
DeVito, Zachary ;
Ellison, Elias ;
Feng, Will ;
Gong, Jiong ;
Gschwind, Michael ;
Hirsh, Brian ;
Huang, Sherlock ;
Kalambarkar, Kshiteej ;
Kirsch, Laurent ;
Lazos, Michael ;
Lezcano, Mario ;
Liang, Yanbo ;
Liang, Jason ;
Lu, Yinghai ;
Luk, C. K. ;
Maher, Bert ;
Pan, Yunjie ;
Puhrsch, Christian ;
Reso, Matthias ;
Saroufim, Mark ;
Siraichi, Marcos Yukio ;
Suk, Helen ;
Suo, Michael ;
Tillet, Phil ;
Wang, Eikan ;
Wang, Xiaodong ;
Wen, William ;
Zhang, Shunting ;
Zhao, Xu ;
Zhou, Keren ;
Zou, Richard ;
Mathews, Ajit ;
Chanan, Gregory ;
Wu, Peng ;
Chintala, Soumith .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2, 2024, :929-947
[4]   An automatic tool to analyze and cluster macromolecular conformations based on self-organizing maps [J].
Bouvier, Guillaume ;
Desdouits, Nathan ;
Ferber, Mathias ;
Blondel, Arnaud ;
Nilges, Michael .
BIOINFORMATICS, 2015, 31 (09) :1490-1492
[5]  
Cho YS, 2014, 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
[6]   Parallel Implementation of a Machine Learning Algorithm on GPU [J].
Cuomo, Salvatore ;
De Michele, Pasquale ;
Di Nardo, Emanuel ;
Marcellino, Livia .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (05) :923-942
[7]  
David R., 2018, White Paper
[8]   SOMprocessor: A high throughput FPGA-based architecture for implementing Self-Organizing Maps and its application to video processing [J].
de Abreu de Sousa, Miguel Angelo ;
Pires, Ricardo ;
Del-Moral-Hernandez, Emilio .
NEURAL NETWORKS, 2020, 125 (125) :349-362
[9]  
Deboeck G., 1998, VISUAL EXPLORATIONS
[10]  
Deboeck G., 2000, Neural Netw. World, V8, P213