Enhancing text understanding of decoder-based model by leveraging parameter-efficient fine-tuning method

被引:0
作者
Feroze, Wasif [1 ]
Cheng, Shaohuan [1 ]
Jimale, Elias Lemuye [1 ]
Jakhro, Abdul Naveed [2 ]
Qu, Hong [1 ]
机构
[1] School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
[2] Department of Information Technology, Shaheed Benazir Bhutto University, Naushahro Feroze Campus, Naushahro Feroze
关键词
Large language models; MRC question answering; Natural language understanding; Parameter-efficient fine-tuning;
D O I
10.1007/s00521-025-10975-3
中图分类号
学科分类号
摘要
Machine reading comprehension (MRC) is a fundamental natural language understanding task in natural language processing, which aims to comprehend the text of a given passage and answer questions based on it. Understanding implicit information, deducing the logical structure of information, and connecting context from different pieces of information make the MRC task difficult. Most current state-of-the-art approaches for MRC are using encoder-based models. However, no earlier research proposed a decoder-only model for MRC question-answering datasets, although language models based on this category achieved unprecedented performance in different generative tasks. In this paper, we propose a parameter-efficient fine-tuning framework that effectively increases MRC capabilities on decoder-only large language models. This framework designs the process for MRC and introduces the low-rank adaptation (LoRA) method to effectively fine-tune the large model with many parameters, even with lower hardware resource requirements than the previous methods. In addition, we also integrate a quantized model inference strategy for the fine-tuned model to improve practicability further. We conducted experiments on four types of MRC datasets. After extensive experiments, our results show that our model achieved a significant performance boost over baselines and outperformed other strong models for MRC. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
引用
收藏
页码:6899 / 6913
页数:14
相关论文
共 53 条
  • [1] Baradaran R., Ghiasi R., Amirkhani H., A survey on machine reading comprehension systems, Nat Lang Eng, 28, pp. 683-732, (2020)
  • [2] Sugawara S., Stenetorp P., Inui K., Aizawa A., Assessing the benchmarking capacity of machine reading comprehension datasets, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8918-8927, (2020)
  • [3] Liu X., Yang M., Lyu Z., Lin D., Li P., Xu R., Hierarchical conversation flow transition and reasoning for conversational machine comprehension, Neural Comput Appl, 35, 3, pp. 2413-2428, (2023)
  • [4] Gao J., Galley M., Li L., Neural approaches to conversational AI, . In: Proceedings of the 56Th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pp. 2-7, (2018)
  • [5] Clark C., Gardner M., Simple and effective multi-paragraph reading comprehension, Proceedings of the 56Th Annual Meeting of the Association for Computational Linguistics, 1, pp. 845-855, (2018)
  • [6] Hu M., Peng Y., Huang Z., Qiu X., Wei F., Zhou M., Reinforced mnemonic reader for machine reading comprehension, In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 4099-4106, (2018)
  • [7] Yan H., Liu L., Feng X., Huang Q., Leveraging greater relations for improving multi-choice reading comprehension, Neural Comput Appl, 34, 23, pp. 20851-20864, (2022)
  • [8] Zhang Z., Yang J., Zhao H., Retrospective reader for machine reading comprehension, In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 14506-14514, (2021)
  • [9] Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M., Lacroix T., Roziere B., Goyal N., Hambro E., Azhar F., Rodriguez A., Joulin A., Grave E., Lample G., Llama: Open and Efficient Foundation Language Models, (2023)
  • [10] Brown T.B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D.M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D., Language models are few-shot learners, Proceedings of the 34Th International Conference on Neural Information Processing Systems. NIPS’20, (2020