Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy

被引:19
作者
Lorenzon, Arthur Francisco [1 ]
Cera, Marcia Cristina [2 ]
Schneider Beck, Antonio Carlos [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Ave Bento Goncalves,9500 Campus Vale, BR-91501970 Porto Alegre, RS, Brazil
[2] Fed Univ Pampa UNIPAMPA, Campus Alegrete,Ave Tiaraju 810, BR-97546550 Alegrete, Brazil
关键词
Embedded and general-purpose processors; Thread-level parallelism exploitation; Multicore architectures; Performance; Energy; Energy-delay product; ARM;
D O I
10.1016/j.jpdc.2016.04.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Thread-level parallelism (TLP) is being widely exploited in embedded and general-purpose multicore processors (GPPs) to increase performance. However, parallelizing an application involves extra executed instructions and accesses to the shared memory, to communicate and synchronize. The overhead of accessing the shared memory, which is very costly in terms of delay and energy because it is at the bottom of the hierarchy, varies depending on the communication model and level of data exchange/synchronization of the application. On top of that, multicore processors are implemented using different architectures, organizations and memory subsystems. In this complex scenario, we evaluate 14 parallel benchmarks implemented with 4 different parallel programming interfaces (PPIs), with distinct communication rates and TLP, running on five representative multicore processors targeted to general-purpose and embedded systems. We show that while the former presents the best performance and the latter will be the most energy efficient, there is no single option that offers the best result for both. We also demonstrate that in applications with low levels of communication, what matters is the communication model, not a specific PPI. On the other hand, applications with high communication demands have a huge search space that can be explored. For those, Pthreads is the most efficient PPI for Intel Processors, while OpenMP is the best for ARM ones. MPI is the worst choice in almost any scenario, and gets very inefficient as the TLP increases. We also evaluate energy delay(x) product ((EDP)-P-x), weighting performance towards energy by varying the value of x. In a representative case where energy is the most important, three different processors can be the best alternative for different values of x. Finally, we explore how static power influences total energy consumption, showing that its increase brings benefits to ARM multiprocessors, with the opposite effect for Intel ones. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:107 / 123
页数:17
相关论文
共 36 条
  • [11] Dukan P, 2014, INT SYMP COMP INTELL, P127, DOI 10.1109/CINTI.2014.7028662
  • [12] Power Limitations and Dark Silicon Challenge the Future of Multicore
    Esmaeilzadeh, Hadi
    Blem, Emily
    Amant, Renee St.
    Sankaralingam, Karthikeyan
    Burger, Doug
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2012, 30 (03):
  • [13] Gropp William, 1999, Using MPI-2: Advanced Features of the Message-Passing Interface
  • [14] Hanawa T, 2009, LECT NOTES COMPUT SC, V5568, P15, DOI 10.1007/978-3-642-02303-3_2
  • [15] Hoefler T., 2007, P 20 ACM IEEE INT C, P1, DOI [10.1145/1362622.1362692, DOI 10.1145/1362622.1362692]
  • [16] Lee KM, 2011, 2011 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), P762
  • [17] Lorenzon AF, 2015, IEEE INT SYMP CIRC S, P1374, DOI 10.1109/ISCAS.2015.7168898
  • [18] Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems
    Lorenzon, Arthur Francisco
    Cera, Marcia Cristina
    Schneider Beck, Antonio Carlos
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2015, 80 (03): : 295 - 307
  • [19] Pin: Building customized program analysis tools with dynamic instrumentation
    Luk, CK
    Cohn, R
    Muth, R
    Patil, H
    Klauser, A
    Lowney, G
    Wallace, S
    Reddi, VJ
    Hazelwood, K
    [J]. ACM SIGPLAN NOTICES, 2005, 40 (06) : 190 - 200
  • [20] McVoy Larry., 2010, LMbench - Tools for Performance Analysis