共 50 条
[21]
Exploration of GPU sharing policies under GEMM workloads
[J].
PROCEEDINGS OF THE 23RD INTERNATIONAL WORKSHOP ON SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS (SCOPES 2020),
2020,
:66-69
[22]
Analyzing CUDA Workloads Using a Detailed GPU Simulator
[J].
ISPASS 2009: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE,
2009,
:163-174
[23]
GPU-Initiated Resource Allocation for Irregular Workloads
[J].
PROCEEDINGS OF 2024 3RD INTERNATIONAL WORKSHOP ON EXTREME HETEROGENEITY SOLUTIONS, EXHET 2024,
2024,
:1-8
[24]
Understanding of GPU Architectural Vulnerability for Deep Learning Workloads
[J].
2019 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT),
2019,
[25]
Optimizing Deep Learning Workloads on ARM GPU with TVM
[J].
1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING,
2018,
[26]
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
[J].
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2021,
[27]
Effective Performance Portability
[J].
PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018),
2018,
:24-36
[28]
Provision and use of GPU resources for distributed workloads via the Grid
[J].
24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019),
2020, 245
[29]
Modeling GPU Dynamic Parallelism for self similar density workloads
[J].
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,
2023, 145
:239-253
[30]
Characterizing Multi-Instance GPU for Machine Learning Workloads
[J].
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022),
2022,
:724-731