ReTMiC: Reliability-Aware Thermal Management in Multicore Mixed-Criticality Embedded Systems

被引:0
作者
Safari, Sepideh [1 ]
Ansari, Mohsen [1 ,2 ]
Hessabi, Shaahin [2 ]
Henkel, Joerg [3 ]
机构
[1] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran 193955746, Iran
[2] Sharif Univ Technol, Dept Comp Sci & Engn, Tehran 1136511155, Iran
[3] Karlsruhe Inst Technol KIT, Dept Comp Sci, D-76131 Karlsruhe, Germany
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Reliability; Multicore processing; Quality of service; Fault tolerant systems; Fault tolerance; Real-time systems; Timing; Embedded systems; Termination of employment; Temperature; Mixed-criticality systems; multicore platforms; task replication; embedded systems; thermal balancing; REDUNDANCY; DEMAND;
D O I
10.1109/ACCESS.2025.3542472
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the number of cores in multicore platforms increases, temperature constraints may prevent powering all cores simultaneously at maximum voltage and frequency level. Thermal hot spots and unbalanced temperatures between the processing cores may degrade the reliability. This paper introduces a reliability-aware thermal management scheduling (ReTMiC) method for mixed-criticality embedded systems. In this regard, ReTMiC meets Thermal Design Power as the chip-level power constraint at design time. In order to balance the temperature of the processing cores, our proposed method determines balancing points on each frame of the scheduling, and at run time, our proposed lightweight online re-mapping technique is activated at each determined balancing point for balancing the temperature of the processing cores. The online mechanism exploits the proposed temperature-aware factor to reduce the system's temperature based on the current temperature of processing cores and the behavior of their corresponding running tasks. Our experimental results show that the ReTMiC method achieves up to 12.8 degrees C reduction in the chip temperature and 3.5 degrees C reduction in spatial thermal variation in comparison to the state-of-the-art techniques while keeping the system reliability at a required level.
引用
收藏
页码:33157 / 33175
页数:19
相关论文
共 50 条
[21]   Energy-aware reliability guarantee scheduling with semi-clairvoyant in mixed-criticality systems [J].
Zhang, Yi-Wen ;
Zheng, Hui .
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 156
[22]   Peak-Power-Aware Primary-Backup Technique for Efficient Fault-Tolerance in Multicore Embedded Systems [J].
Ansari, Mohsen ;
Salehi, Mohammad ;
Safari, Sepideh ;
Ejlali, Alireza ;
Shafique, Muhammad .
IEEE ACCESS, 2020, 8 (08) :142843-142857
[23]   Lifetime-aware real-time task scheduling on fault-tolerant mixed-criticality embedded systems [J].
Cao, Kun ;
Xu, Guo ;
Zhou, Junlong ;
Chen, Mingsong ;
Wei, Tongquan ;
Li, Keqin .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 :165-175
[24]   A framework for reliability-aware embedded system design on multiprocessor platforms [J].
Huang, Jia ;
Barner, Simon ;
Raabe, Andreas ;
Buck, Christian ;
Knoll, Alois .
MICROPROCESSORS AND MICROSYSTEMS, 2014, 38 (06) :539-551
[25]   Resource Sharing in Multicore Mixed-Criticality Systems: Utilization Bound and Blocking Overhead [J].
Han, Jian-Jun ;
Tao, Xin ;
Zhu, Dakai ;
Yang, Laurence T. .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (12) :3626-3641
[26]   Robust Mixed-Criticality Systems [J].
Burns, Alan ;
Davis, Robert, I ;
Baruah, Sanjoy ;
Bate, Iain .
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) :1478-1491
[27]   Mixed-criticality scheduling on heterogeneous multicore systems powered by energy harvesting [J].
Xiang, Yi ;
Pasricha, Sudeep .
INTEGRATION-THE VLSI JOURNAL, 2018, 61 :114-124
[28]   ReMap: Reliability Management of Peak-Power-Aware Real-Time Embedded Systems Through Task Replication [J].
Yeganeh-Khaksar, Amir ;
Ansari, Mohsen ;
Ejlali, Alireza .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2022, 10 (01) :312-323
[29]   Reliability guaranteed energy minimization on mixed-criticality systems [J].
Li, Zheng ;
Guo, Chunhui ;
Hua, Xiayu ;
Ren, Shangping .
JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 112 :1-10
[30]   Ring-DVFS: Reliability-Aware Reinforcement Learning-Based DVFS for Real-Time Embedded Systems [J].
Yeganeh-Khaksar, Amir ;
Ansari, Mohsen ;
Safari, Sepideh ;
Yari-Karin, Sina ;
Ejlali, Alireza .
IEEE EMBEDDED SYSTEMS LETTERS, 2021, 13 (03) :146-149