PyTorch 2: Faster Machine Learning Through Dynamic Python']Python Bytecode Transformation and Graph Compilation

被引:184
作者
Ansel, Jason [1 ]
Yang, Edward [1 ]
He, Horace [1 ]
Gimelshein, Natalia [2 ]
Jain, Animesh [1 ]
Voznesensky, Michael [1 ]
Bao, Bin [1 ]
Bell, Peter [3 ]
Berard, David [1 ]
Burovski, Evgeni [3 ]
Chauhan, Geeta [1 ]
Chourdia, Anjali [1 ]
Constable, Will [1 ]
Desmaison, Alban [1 ]
DeVito, Zachary [1 ]
Ellison, Elias [1 ]
Feng, Will [1 ]
Gong, Jiong [4 ]
Gschwind, Michael [1 ]
Hirsh, Brian [1 ]
Huang, Sherlock [1 ]
Kalambarkar, Kshiteej [3 ]
Kirsch, Laurent [1 ]
Lazos, Michael [1 ]
Lezcano, Mario [3 ]
Liang, Yanbo [1 ]
Liang, Jason [1 ]
Lu, Yinghai [1 ]
Luk, C. K. [1 ]
Maher, Bert [1 ]
Pan, Yunjie [5 ]
Puhrsch, Christian [1 ]
Reso, Matthias [1 ]
Saroufim, Mark [1 ]
Siraichi, Marcos Yukio [3 ]
Suk, Helen [1 ]
Suo, Michael [1 ]
Tillet, Phil [2 ]
Wang, Eikan [4 ]
Wang, Xiaodong [1 ]
Wen, William [1 ]
Zhang, Shunting [1 ]
Zhao, Xu [1 ]
Zhou, Keren [2 ,6 ]
Zou, Richard [1 ]
Mathews, Ajit [1 ]
Chanan, Gregory [1 ]
Wu, Peng [1 ]
Chintala, Soumith [1 ]
机构
[1] Meta, Cambridge, MA 32712 USA
[2] OpenAI, San Francisco, CA USA
[3] Quansight, Austin, TX USA
[4] Intel, Santa Clara, CA USA
[5] Univ Michigan, Ann Arbor, MI USA
[6] George Mason Univ, Fairfax, VA USA
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2 | 2024年
关键词
D O I
10.1145/3620665.3640366
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement the torch.compile feature released in PyTorch 2. TorchDynamo is a Python-level just-in-time (JIT) compiler that enables graph compilation in PyTorch programs without sacrificing the flexibility of Python. It achieves this by dynamically modifying Python bytecode before execution and extracting sequences of PyTorch operations into an FX graph, which is then JIT compiled using one of many extensible backends. TorchInductor is the default compiler backend for TorchDynamo, which translates PyTorch programs into OpenAI's Triton for GPUs and C++ for CPUs. Results show that TorchDynamo is able to capture graphs more robustly than prior approaches while adding minimal overhead, and TorchInductor is able to provide a 2.27x inference and 1.41x training geometric mean speedup on an NVIDIA A100 GPU across 180+ real-world models, which outperforms six other compilers. These extensions provide a new way to apply optimizations through compilers in eager mode frameworks like PyTorch.
引用
收藏
页码:929 / 947
页数:19
相关论文
共 58 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Abbasi Hameer, 2020, Improving subclassing Tensor by propagating subclass instances
[3]  
Agrawal A, 2019, Arxiv, DOI arXiv:1903.01855
[4]  
[Anonymous], 2023, ONNX
[5]  
[Anonymous], 2019, 33RD INT C NEUR IN
[6]  
[Anonymous], 2019, SW The IREE Authors
[7]  
[Anonymous], 2016, Theano: A Python framework for fast computation of mathematical expressions
[8]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[9]  
Baghdadi R, 2019, INT SYM CODE GENER, P193, DOI [10.5281/zenodo.2375075, 10.1109/CGO.2019.8661197]
[10]  
Bradbury James., 2018, JAX: composable transformations of Python+NumPy programs