共 45 条
- [21] Huang Yangsibo, Gupta S, Xia Mengzhou, Et al., Catastrophic jailbreak of open-source LLMs via exploiting generation, (2023)
- [22] Wen Jiaxin, Ke Pei, Sun Hao, Et al., Unveiling the implicit toxicity in large language models, (2023)
- [23] Madaan A, Tandon N, Gupta P, Et al., Self-refine: Iterative refinement with self-feedback, (2023)
- [24] Welleck S, Lu Ximing, West P, Et al., Generating sequences by learning to self-correct, (2022)
- [25] Gandikota R, Materzynska J, Fiotto-Kaufman J, Et al., Erasing concepts from diffusion models[C], Proc of Int Conf on Computer Vision, pp. 2426-2436, (2023)
- [26] Yao Yunzhi, Wang Peng, Tian Bozhong, Et al., Editing large language models: Problems, methods, and opportunities, Empirical Methods in Natural Language Processing, pp. 10222-10240, (2023)
- [27] Geva M, Caciularu A, Wang K R, Et al., Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space[C], Empirical Methods in Natural Language Processing, pp. 30-45, (2022)
- [28] Hu Xinshuo, Li Dongfang, Hu Baotian, Et al., Separate the wheat from the chaff: Model deficiency unlearning via parameter-efficient module operation, (2023)
- [29] Zhang Yizhe, Galley M, Gao Jianfeng, Et al., Generating informative and diverse conversational responses via adversarial information maximization[C], Advances in Neural Information Processing Systems, pp. 31-56, (2018)
- [30] Heryanto Y, Triayudi A., Evaluating text quality of GPT engine davinci-003 and GPT engine davinci generation using BLEU score[J], SAGA: Journal of Technology and Information System, 1, 4, pp. 121-129, (2023)