Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 language models using a new ChatGPT generated dataset: ParaGPT

被引:1
作者
Pehlivanoglu, Meltem Kurt [1 ]
Gobosho, Robera Tadesse [1 ]
Syakura, Muhammad Abdan [1 ]
Shanmuganathan, Vimal [2 ]
de-la-Fuente-Valentin, Luis [3 ]
机构
[1] Kocaeli Univ, Dept Comp Engn, TR-41001 Kocaeli, Turkiye
[2] Sri Eshwar Coll Engn, Dept Artificial Intelligence & Data Sci, Coimbatore, Tamil Nadu, India
[3] Univ Int La Rioja, Sch Engn & Technol, Logrono, Spain
关键词
ChatGPT; generative artificial intelligence; large language models; machine learning;
D O I
10.1111/exsy.13699
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Paraphrase generation is a fundamental natural language processing (NLP) task that refers to the process of generating a well-formed and coherent output sentence that exhibits both syntactic and/or lexical diversity from the input sentence, while simultaneously ensuring that the semantic similarity between the two sentences is preserved. However, the availability of high-quality paraphrase datasets has been limited, particularly for machine-generated sentences. In this paper, we present ParaGPT, a new paraphrase dataset of 81,000 machine-generated sentence pairs, including 27,000 reference sentences (ChatGPT-generated sentences), and 81,000 paraphrases obtained by using three different large language models (LLMs): ChatGPT, GPT-3, and T5. We used ChatGPT to generate 27,000 sentences that cover a diverse array of topics and sentence structures, thus providing diverse inputs for the models. In addition, we evaluated the quality of the generated paraphrases using various automatic evaluation metrics. Furthermore, we provide insights into the strengths and drawbacks of each LLM in generating paraphrases by conducting a comparative analysis of the paraphrasing performance of the three LLMs. According to our findings, ChatGPT's performance, as per the evaluation metrics provided, was deemed impressive and commendable, owing to its higher-than-average scores for semantic similarity, which implies a higher degree of similarity between the generated paraphrase and the reference sentence, and its relatively lower scores for syntactic diversity, indicating a greater diversity of syntactic structures in the generated paraphrase. ParaGPT is a valuable resource for researchers working on NLP tasks like paraphrasing, text simplification, and text generation. We make the ParaGPT dataset publicly accessible to researchers, and as far as we are aware, this is the first paraphrase dataset produced based on ChatGPT.
引用
收藏
页数:22
相关论文
共 35 条
[1]  
Alshater M, 2022, SSRN Electronic Journal, DOI [10.2139/ssrn.4312358, 10.2139/SSRN.4312358, 10.2139/ssrn.4312358, DOI 10.2139/SSRN.4312358]
[2]  
Bahrini Aram, 2023, Chatgpt: Applications, opportunities, and threats
[3]  
Bandel Elron, 2022, QUALITY CONTROLLED P
[4]  
Banerjee S., 2005, P ACL WORKSHOP INTRI, P65, DOI DOI 10.3115/1626355.1626389
[5]  
BERTScore, 2023, BERTSCORE DEFAULT LA
[6]  
Brown TB, 2020, ADV NEUR IN, V33
[7]  
Cer Daniel., 2017, SEMEVAL ACL, P1, DOI [10.18653/v1/S17-2001, 10.18653/v1/s17-2001, DOI 10.18653/V1/S17-2001]
[8]   What if They Are All High-Risk for Attrition? Correlates of Retention in a Longitudinal Study of Reentry from Prison [J].
Clark, Kendra J. ;
Mitchell, Meghan M. ;
Fahmy, Chantal ;
Pyrooz, David C. ;
Decker, Scott H. .
INTERNATIONAL JOURNAL OF OFFENDER THERAPY AND COMPARATIVE CRIMINOLOGY, 2020, :1807-1842
[9]  
Damodaran Prithiviraj., 2021, Parrot: Paraphrase Generation for NLU.
[10]  
DataCanary L.J.M.R.N.D.t, 2017, HILFIALKAFF QUORA QU