共 50 条
- [1] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498
- [3] Effective Compression of Language Models by Combining Pruning and Knowledge Distillation 2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 429 - 438
- [5] Mixture of Prompt Experts for Natural Language Inference 2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 43 - 48
- [8] DistillSeq: A Framework for Safety Alignment Testing in Large Language Models using Knowledge Distillation PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 578 - 589
- [9] DSG-KD: Knowledge Distillation From Domain-Specific to General Language Models IEEE ACCESS, 2024, 12 : 130973 - 130982
- [10] Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 461 - 477