iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

被引:44
作者
Gu, Peng [1 ]
Xie, Xinfeng [1 ]
Ding, Yufei [2 ]
Chen, Guoyang [3 ]
Zhang, Weifeng [3 ]
Niu, Dimin [4 ]
Xie, Yuan [1 ,4 ]
机构
[1] UCSB, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
[2] UCSB, Dept Comp Sci, Santa Barbara, CA USA
[3] Alibaba Cloud Infrastruct, Sunnyvale, CA USA
[4] Alibaba DAMO Acad, Sunnyvale, CA USA
来源
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) | 2020年
基金
美国国家科学基金会;
关键词
Process-in-memory; Image Processing; Accelerator; LANGUAGE; COMPILER;
D O I
10.1109/ISCA45697.2020.00071
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image processing is becoming an increasingly important domain for many applications on workstations and the datacenter that require accelerators for high performance and energy efficiency. GPU, which is the state-of-the-art accelerator for image processing, suffers from the memory bandwidth bottleneck. To tackle this bottleneck, near-bank architecture provides a promising solution due to its enormous bank-internal bandwidth and low-energy memory access. However, previous work lacks hardware programmability, while image processing workloads contain numerous heterogeneous pipeline stages with diverse computation and memory access patterns. Enabling programmable near-bank architecture with low hardware overhead remains challenging. This work proposes iPIM, the first programmable in-memory image processing accelerator using near-bank architecture. We first design a decoupled control-execution architecture to provide lightweight programmability support. Second, we propose the SIMB (Single-Instruction-Multiple-Bank) ISA to enable flexible control flow and data access. Third, we present an end-to-end compilation flow based on Halide that supports a wide range of image processing applications and maps them to our SIMB ISA. We further develop iPIM-aware compiler optimizations, including register allocation, instruction reordering, and memory order enforcement to improve performance. We evaluate a set of representative image processing applications on iPIM and demonstrate that on average iPIM obtains 11.02x acceleration and 79.49% energy saving over an NVIDIA Tesla V100 GPU. Further analysis shows that our compiler optimizations contribute 3.19x speedup over the unoptimized baseline.
引用
收藏
页码:804 / 817
页数:14
相关论文
共 36 条
  • [21] Enhancing image processing architecture using deep learning for embedded vision systems
    Udendhran, R.
    Balamurugan, M.
    Suresh, A.
    Varatharajan, R.
    MICROPROCESSORS AND MICROSYSTEMS, 2020, 76
  • [22] A Programmable CNN Architecture and Its Hardware-Software Co-design Approach for Image Processing and Stimulating Visual Illusions
    Zheng, Jianwei
    Xu, Chunhang
    Guo, Donghui
    2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2016), 2016, : 389 - 394
  • [23] A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory
    Bapat, Ojas A.
    Franzon, Paul D.
    Fastow, Richard M.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (12) : 2701 - 2712
  • [24] Tuberculosis Detection Architecture with Image Processing using the SIFT and K-Means Algorithm
    Calderon Vilca, Hugo D.
    Ortega Melgarejo, Luis M.
    Larico Uchamaco, Guido R.
    Cardenas Marino, Flor C.
    COMPUTACION Y SISTEMAS, 2020, 24 (03): : 989 - 997
  • [25] Parallel Hardware Architecture for Medical Image Processing Using Xilinx-System-Generator
    Baali, Mehdi
    Bourbia, Nadjla
    Messaoudi, Kamel
    Bourennane, El-Bay
    PROGRAM OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATIC CONTROL, ICEEAC 2024, 2024,
  • [26] Image processing enhancement over WAP: An enhancement of transmitting image over mobile devices using WAP architecture
    Salem, AH
    Rayudu, AK
    Kumar, A
    Elmaghraby, AS
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 22 - 27
  • [27] Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards
    Abu Asaduzzaman
    Jojigiri, Srinivas
    Sabu, Thushar
    Tailam, Sanath
    2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 689 - 695
  • [28] SIMD/restricted MIMD parallel architecture for image processing based on a new design of a multi-mode access memory
    Torres, D
    Mathias, H
    Rabah, H
    Weber, S
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 567 - 570
  • [29] Caries Lesion Detection Tool using Near Infrared Image Processing and Decision Tree Learning
    Balbin, Jessie R.
    Banhaw, Renalyn L.
    Martin, Christian Raye O.
    Rivera, Joanne Lorie R.
    Victorino, Jeffrey R.
    FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2019, 11198
  • [30] Parallel image processing using a Pentium based shared-memory multiprocessor-system
    Rothlubbers, C
    Orglmeister, R
    PARALLEL AND DISTRIBUTED METHODS FOR IMAGE PROCESSING, 1997, 3166 : 46 - 54