Data layout and SIMD abstraction layers: decoupling interfaces from implementations

被引:5
作者
Jubertie, Sylvain [1 ]
Masliah, Ian [2 ]
Falcou, Joel [3 ]
机构
[1] Univ Orleans, INSA, Ctr Val de Loire, LIFO,EA 4022, Orleans, France
[2] Sorbonne Univ, CNRS, LIP6, Paris, France
[3] Univ Paris Sud, LRI, Orsay, France
来源
PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS) | 2018年
关键词
data layouts; AoS SoA AoSoA; SIMD; vectorization;
D O I
10.1109/HPCS.2018.00089
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
From a high level point of view, developers define objects they manipulate in terms of structures or classes. For example, a pixel may be represented as a structure of three color components : red, green, blue and an image as an array of pixels. In such cases, the data layout is said to be organized as an array of structures (AoS). However, developing efficient applications on modern processors and accelerators often require to organize data in different ways. An image may also be stored as a structure of three arrays, one for each component. This data layout is called a structure of array (SoA) and is also mandatory to take advantage of SIMD units embedded in all modern processors. In this paper, we propose a lightweight C++ template-based framework to provide the high level representation most programmers use (AoS) on different data layouts fitted for SIMD vectorization. Some templated containers are provided for each proposed layout with a uniform AoS-like interface to access elements. Containers are transformed into different combinations of tuples and vectors from the C++ Standard Template Library (STL) at compile time. This way, we provide more optimization opportunities for the code, especially automatic vectorization. We study the performance of our data-layouts and compare them to their explicit versions, based on structures and vectors, for different algorithms and architectures (x86 and ARM). Results show that compilers do not always perform automatic vectorization on our data-layouts as with their explicit versions even if underlying containers and access patterns are similar. Thus, we investigate the use of SIMD intrinsics and of Boost. SIMD1/bSIMD libraries to vectorize the codes. We show that combining our approach with Boost. SIMD/bSIMD libraries ensures a similar performance as with a manual vectorization using intrinsics, and in almost all cases better performance than with automatic vectorization without increasing the code complexity.
引用
收藏
页码:531 / 538
页数:8
相关论文
共 10 条
[1]  
Calore E., 2016, EXPERIENCE VECTORIZI, P53
[2]   Kokkos: Enabling performance portability across manycore architectures [J].
Edwards, H. Carter ;
Trott, Christian R. .
2013 EXTREME SCALING WORKSHOP (XSW 2013), 2014, :18-24
[3]  
Edwards HC, 2012, SCI PROGRAMMING-NETH, V20, P89, DOI [10.1155/2012/917630, 10.3233/SPR-2012-0343]
[4]  
Esterie P., 2014, Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, P1, DOI DOI 10.1145/2568058.2568063
[5]   Impact of Data Structure Layout on Performance [J].
Faria, Nuno ;
Silva, Rui ;
Sobral, Joao L. .
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, :116-120
[6]  
Karpinski P., 2017, P 8 INT WORKSH PROGR, P21, DOI DOI 10.1145/3026937.3026939
[7]   Vc: A C++ library for explicit vectorization [J].
Kretz, Matthias ;
Lindenstruth, Volker .
SOFTWARE-PRACTICE & EXPERIENCE, 2012, 42 (11) :1409-1430
[8]  
Majeti D., 2013, European Conference on Parallel Processing, P188
[9]  
Majeti D., 2016, P 25 INT C COMPILER, P240
[10]  
Shixiong Xu, 2014, Network and Parallel Computing. 11th IFIP WG 10.3 International Conference, NPC 2014. Proceedings: LNCS 8707, P485, DOI 10.1007/978-3-662-44917-2_40