EVALUATION OF QUANTIZED LARGE LANGUAGE MODELS IN THE TEXT SUMMARIZATION PROBLEM

被引:0
作者
Nedashkovskaya, N. I. [1 ]
Yeremichuk, R. I. [1 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Inst Appl Syst Anal, Dept Math Methods Syst Anal, Kyiv, Ukraine
关键词
limited resources; natural language processing; text summarization; large language models; quantization; multi criteria analysis; INFORMATION; DISCRETE;
D O I
10.15588/1607-3274-2025-2-12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Context. The problem of increasing the efficiency of deep artificial neural networks in terms of memory and energy consumption, and the multi-criteria evaluation of the quality of the results of large language models (LLM) taking into account the judgments of users in the task of summarizing texts, are considered. The object of the study is the process of automated text summarization based on LLMs. Objective. The goal of the work is to find a compromise between the complexity of the LLM, its performance and operational efficiency in text summarization problem. Method. An LLM evaluation algorithm based on multiple criteria is proposed, which allows choosing the most appropriate LLM model for text summarization, finding an acceptable compromise between the complexity of the LLM model, its performance and the quality of text summarization. A significant improvement in the accuracy of results based on neural networks in natural language processing tasks is often achieved by using models that are too deep and over-parameterized, which significantly limits the ability of the models to be used in real-time inference tasks, where high accuracy is required under conditions of limited resources. The proposed algorithm selects an acceptable LLM model based on multiple criteria, such as accuracy metrics BLEU, Rouge-1, 2, Rouge-L, BERT-scores, speed of text generalization, or other criteria defined by the user in a specific practical task of intellectual analysis. The algorithm includes analysis and improvement of consistency of user judgments, evaluation of LLM models in terms of each criterion. Results. Software is developed for automatically extracting texts from online articles and summarizing these texts. Nineteen quantized and non-quantized LLM models of various sizes were evaluated, including LLaMa-3-8B-4bit, Gemma-2B-4bit, Gemma-1.1-7B-4bit, Qwen-1.5-4B-4bit, Stable LM-2-1.6B-4bit, Phi-2-4bit, Mistal-7B-4bit, GPT-3.5 Turbo and other LLMs in terms of BLEU, Rouge-1, Rouge-2, Rouge-L and BERT-scores on two different datasets: XSum and CNN/ Daily Mail 3.0.0. Conclusions. The conducted experiments have confirmed the functionality of the proposed software, and allow to recommend it for practical use for solving the problems of text summarizing. Prospects for further research may include deeper analysis of metrics and criteria for evaluating quality of generated texts, experimental research of the proposed algorithm on a larger number of practical tasks of natural language processing.
引用
收藏
页码:133 / 147
页数:15
相关论文
共 61 条
[1]  
[Anonymous], About us
[2]  
Bai Y, 2019, Arxiv, DOI arXiv:1810.00861
[3]   UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks [J].
Baskin, Chaim ;
Liss, Natan ;
Schwartz, Eli ;
Zheltonozhskii, Evgenii ;
Giryes, Raja ;
Bronstein, Alex M. ;
Mendelson, Avi .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2021, 37 (1-4) :1-4
[4]  
Basyal L, 2023, Arxiv, DOI arXiv:2310.10449
[5]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[6]  
Brown TB, 2020, ADV NEUR IN, V33
[7]  
Chung HW, 2024, J MACH LEARN RES, V25
[8]  
[Clark Kevin ELECTRA ELECTRA], 2020, arXiv, DOI [DOI 10.48550/ARXIV.2003.10555, DOI 10.48550/arXiv.2003.10555]
[9]  
Dai AM, 2015, ADV NEUR IN, V28
[10]  
Dettmers T, 2022, ADV NEUR IN