(sic) UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

被引:0
作者
Liang, Xun [1 ]
Song, Shichao [1 ]
Niu, Simin [1 ]
Li, Zhiyu [2 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Yezhaohui [2 ]
He, Dawei [3 ]
Cheng, Peng [3 ]
Wang, Zhonghao [3 ]
Deng, Haiying [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Inst Adv Algorithms Res, Shanghai, Peoples R China
[3] State Key Lab Media Convergence Prod Technol & Sy, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) produce hallucinated text, compromising their practical utility in professional contexts. To assess the reliability of LLMs, numerous initiatives have developed benchmark evaluations for hallucination phenomena. However, they often employ constrained generation techniques to produce the evaluation dataset due to cost and time limitations. For instance, this may involve employing directed hallucination induction or deliberately modifying authentic text to generate hallucinations. These are not congruent with the unrestricted text generation demanded by real-world applications. Furthermore, a well-established Chinese-language dataset dedicated to the evaluation of hallucinations is presently lacking. Consequently, we have developed an Unconstrained Hallucination Generation Evaluation (UHGEval) benchmark, containing hallucinations generated by LLMs with minimal restrictions(1). Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. We have also evaluated prominent Chinese LLMs and the GPT series models to derive insights regarding hallucination.
引用
收藏
页码:5266 / 5293
页数:28
相关论文
共 43 条
  • [1] Chang Yupeng, 2024, ACM Trans. Intell. Syst. Technol. Just Accepted
  • [2] Chen D, 2024, Arxiv, DOI [arXiv:2401.03385, DOI 10.48550/ARXIV.2401.03385]
  • [3] Chen JJ, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, P9890
  • [4] Cheng QY, 2023, Arxiv, DOI arXiv:2310.03368
  • [5] Dependency Grammar
    de Marneffe, Marie-Catherine
    Nivre, Joakim
    [J]. ANNUAL REVIEW OF LINGUISTICS, VOL 5, 2019, 5 : 197 - 218
  • [6] Du Z., 2022, Long Papers, V1, P320
  • [7] Elaraby M, 2023, Arxiv, DOI arXiv:2308.11764
  • [8] Fu JL, 2023, Arxiv, DOI arXiv:2302.04166
  • [9] InternL M., 2023, Internlm: A multilingual language model with progressively enhanced capabilities
  • [10] Synthesis and performance investigation of carbon black hyperdispersant IMD
    Jiang, Shuangshuang
    Bai, Jue
    Ying, Xiaogang
    Han, Jianlin
    Zhu, Kai
    Mao, Lianshan
    [J]. JOURNAL OF COATINGS TECHNOLOGY AND RESEARCH, 2023, 20 (05) : 1529 - 1539