Unmasking large language models by means of OpenAI GPT-4 and Google AI: A deep instruction-based analysis

被引:1
作者
Zahid, Idrees A. [1 ]
Joudar, Shahad Sabbar [1 ]
Albahri, A. S. [2 ]
Albahri, O. S. [3 ,4 ]
Alamoodi, A. H. [5 ,6 ]
Santamaria, Jose [7 ]
Alzubaidi, Laith [8 ,9 ]
机构
[1] Univ Technol Baghdad, Baghdad, Iraq
[2] Imam Jaafar Al Sadiq Univ, Tech Coll, Baghdad, Iraq
[3] Australian Tech & Management Coll, Melbourne, Australia
[4] Mazaya Univ Coll, Comp Tech Engn Dept, Nasiriyah, Iraq
[5] Appl Sci Private Univ, Appl Sci Res Ctr, Amman, Jordan
[6] Middle East Univ, MEU Res Unit, Amman, Jordan
[7] Univ Jaen, Dept Comp Sci, Jaen 23071, Spain
[8] Queensland Univ Technol, Sch Mech Med & Proc Engn, Brisbane, Qld 4000, Australia
[9] Queensland Univ Technol, Ctr Data Sci, Brisbane, Qld 4000, Australia
来源
INTELLIGENT SYSTEMS WITH APPLICATIONS | 2024年 / 23卷
基金
澳大利亚研究理事会;
关键词
OpenAI GPT-4; Google AI; Instruction-based analysis; Sarcasm detection; Deception avoidance; Transformers; ARTIFICIAL-INTELLIGENCE; CHALLENGES;
D O I
10.1016/j.iswa.2024.200431
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have become a hot topic in AI due to their ability to mimic human conversation. This study compares the open artificial intelligence generative pretrained transformer-4 (GPT-4) model, based on the (GPT), and Google's artificial intelligence (AI), which is based on the Bidirectional Encoder Representations from Transformers (BERT) framework in terms of the defined capabilities and the built-in architecture. Both LLMs are prominent in AI applications. First, eight different capabilities were identified to evaluate these models, i.e. translation accuracy, text generation, factuality, creativity, intellect, deception avoidance, sentiment classification, and sarcasm detection. Next, each capability was assessed using instructions. Additionally, a categorized LLM evaluation system has been developed by means of using ten research questions per category based on this paper's main contributions from a prompt engineering perspective. It should be highlighted that GPT-4 and Google AI successfully answered 85 % and 68,7 % of the study prompts, respectively. It has been noted that GPT-4 better understands prompts than Google AI, even with verbal flaws, and tolerates grammatical errors. Moreover, the GPT-4 based approach was more precise, accurate, and succinct than Google AI, which was sometimes verbose and less realistic. While GPT-4 beats Google AI in terms of translation accuracy, text generation, factuality, intellectuality, creativity, and deception avoidance, Google AI outperforms the former when considering sarcasm detection. Both sentiment classification models did work properly. More importantly, a human panel of judges was used to assess and evaluate the model comparisons. Statistical analysis of the judges' ratings revealed more robust results based on examining the specific uses, limitations, and expectations of both GPT-4 and Google AI-based approaches. Finally, the two approaches' transformers, parameter sizes, and attention mechanisms have been examined.
引用
收藏
页数:18
相关论文
共 55 条
  • [1] One hundred important questions facing plant science derived using a large language model
    Agathokleous, Evgenios
    Rillig, Matthias C.
    Penuelas, Josep
    Yu, Zhen
    [J]. TRENDS IN PLANT SCIENCE, 2024, 29 (02) : 210 - 218
  • [2] Unveiling the evolution of generative AI (GAI): a comprehensive and investigative analysis toward LLM models (2021–2024) and beyond
    Zarif Bin Akhtar
    [J]. Journal of Electrical Systems and Information Technology, 11 (1)
  • [3] Albahri A. S., 2024, Application of Data Science and Analyzes, V2024, P1, DOI [10.58496/ADSA/2024/001, DOI 10.58496/ADSA/2024/001]
  • [4] Albahri O. S., 2023, Mesop. J. CyberSecur., V2023, P158
  • [5] Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?
    Alberts, Ian L.
    Mercolli, Lorenzo
    Pyka, Thomas
    Prenosil, George
    Shi, Kuangyu
    Rominger, Axel
    Afshar-Oromieh, Ali
    [J]. EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (06) : 1549 - 1552
  • [6] Comprehensive review of deep learning in orthopaedics: Applications, challenges, trustworthiness, and fusion
    Alzubaidi, Laith
    AL-Dulaimi, Khamael
    Salhi, Asma
    Alammar, Zaenab
    Fadhel, Mohammed A.
    Albahri, A. S.
    Alamoodi, A. H.
    Albahri, O. S.
    Hasan, Amjad F.
    Bai, Jinshuai
    Gilliland, Luke
    Peng, Jing
    Branni, Marco
    Shuker, Tristan
    Cutbush, Kenneth
    Santamaria, Jose
    Moreira, Catarina
    Ouyang, Chun
    Duan, Ye
    Manoufali, Mohamed
    Jomaa, Mohammad
    Gupta, Ashish
    Abbosh, Amin
    Gu, Yuantong
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 155
  • [7] [Anonymous], Graduate record exam
  • [8] The effectiveness of T5, GPT-2, and BERT on text-to-image generation task
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    [J]. PATTERN RECOGNITION LETTERS, 2023, 173 : 57 - 63
  • [9] Challenges and Limitations of ChatGPT and Artificial Intelligence for Scientific Research: A Perspective from Organic Materials
    Cheng, Hao-Wen
    [J]. AI, 2023, 4 (02) : 401 - 405
  • [10] Chuganskaya Anfisa A., 2023, Hybrid Artificial Intelligent Systems: 18th International Conference, HAIS 2023, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (14001), P661, DOI 10.1007/978-3-031-40725-3_56