Efficient GPU Implementation of Lucas-Kanade through OpenACC

被引:4
|
作者
Haggui, Olfa [1 ,2 ]
Tadonki, Claude [1 ]
Sayadi, Fatma [3 ]
Ouni, Bouraoui [2 ]
机构
[1] PSL Res Univ, Mines ParisTech, Ctr Rech Informat CRI, 60 Blvd St Michel, F-75006 Paris, France
[2] Sousse Natl Sch Engn, Networked Objects Control & Commun Syst NOCCS, BP 264 Sousse, Sousse 4023, Erriadh, Tunisia
[3] Fac Sci, Elect & Microelect Lab, Sousse, Tunisia
来源
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5 | 2019年
关键词
Optical Flow; Lucas-Kanade; Multicore; Manycore; GPU; OpenACC;
D O I
10.5220/0007272107680775
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optical flow estimation stands as an essential component for motion detection and object tracking procedures. It is an image processing algorithm, which is typically composed of a series of convolution masks (approximation of the derivatives) followed by 2 x 2 linear systems for the optical flow vectors. Since we are dealing with a stencil computation for each stage of the algorithm, the overhead from memory accesses is expected to be significant and to yield a genuine scalability bottleneck, especially with the complexity of GPU memory configuration. In this paper, we investigate a GPU deployment of an optimized CPU implementation via OpenACC, a directive-based parallel programming model and framework that ease the process of porting codes to a wide-variety of heterogeneous HPC hardware platforms and architectures. We explore each of the major technical features and strive to get the best performance impact. Experimental results on a Quadro P5000 are provided together with the corresponding technical discussions, taking the performance of the multicore version on a INTEL Broadwell EP as the baseline.
引用
收藏
页码:768 / 775
页数:8
相关论文
共 50 条
  • [41] CAVLCU: an efficient GPU-based implementation of CAVLC
    Antonio Fuentes-Alventosa
    Juan Gómez-Luna
    José Maria González-Linares
    Nicolás Guil
    R. Medina-Carnicer
    The Journal of Supercomputing, 2022, 78 : 7556 - 7590
  • [42] Efficient GPU Implementation of Affine Index Permutations on Arrays
    Bouverot-Dupuis, Mathis
    Sheeran, Mary
    PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023, 2023, : 15 - 28
  • [43] EFFICIENT DESIGN AND IMPLEMENTATION OF VISUAL COMPUTING ALGORITHMS ON THE GPU
    Park, In Kyu
    Singhal, Nitin
    Lee, Man Hee
    Cho, Sungdae
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 2321 - +
  • [44] Efficient Parallel Implementation of Morphological Operation on GPU and FPGA
    Li, Teng
    Dou, Yong
    Jiang, Jingfei
    Gao, Jing
    2014 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2014, : 430 - 435
  • [45] Efficient GPU implementation of randomized SVD and its applications
    Struski, Lukasz
    Morkisz, Pawel
    Spurek, Przemyslaw
    Bernabeu, Samuel Rodriguez
    Trzcinski, Tomasz
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [46] Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
    Tredak, Przemyslaw
    Rudnicki, Witold R.
    Majewski, Jacek A.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2016, 321 : 556 - 570
  • [47] Efficient GPU implementation of the multivariate empirical mode decomposition algorithm
    Wang, Zeyu
    Juhasz, Zoltan
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 74
  • [48] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Özgün Özerk
    Can Elgezen
    Ahmet Can Mert
    Erdinç Öztürk
    Erkay Savaş
    The Journal of Supercomputing, 2022, 78 : 2840 - 2872
  • [49] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Ozerk, Ozgun
    Elgezen, Can
    Mert, Ahmet Can
    Ozturk, Erdinc
    Savas, Erkay
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (02) : 2840 - 2872
  • [50] A Efficient Parallel Deblocking Filter Based on GPU: Implementation and Optimization
    Su, Huayou
    Zhang, Chunyuan
    Chai, Jun
    Yang, Qianming
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 280 - 285