共 50 条
- [31] Benchmarking Large Language Models on Controllable Generation under Diversified Instructions THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17808 - 17816
- [32] Benchmarking Causal Study to Interpret Large Language Models for Source Code 2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 329 - 334
- [34] StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11143 - 11156
- [35] Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1624 - 1637
- [36] EchoSwift An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs) COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 158 - 162
- [37] Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study IEEE ACCESS, 2025, 13 : 29698 - 29717
- [39] Large language models and rheumatology: a comparative evaluation LANCET RHEUMATOLOGY, 2023, 5 (10): : E574 - E578
- [40] Automatic Evaluation of Attribution by Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4615 - 4635