Don't Stop Believin': A Unified Evaluation Approach for LLM Honeypots

被引:0
作者
Weber, Simon B. [1 ]
Feger, Marc [1 ]
Pilgermann, Michael [2 ]
机构
[1] Heinrich Heine Univ Dusseldorf, Dept Comp Sci, D-40225 Dusseldorf, Germany
[2] Brandenburg Univ Appl Sci, Dept Comp Sci & Media, D-14770 Brandenburg, Germany
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Annotations; Protocols; Measurement; Complexity theory; Large language models; History; Data models; Accuracy; Usability; Information security; Risk management; Performance evaluation; IT security; honeypot; large language model; GPT; cosine distance; evaluation;
D O I
10.1109/ACCESS.2024.3472460
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The research area of honeypots is gaining new momentum, driven by advancements in large language models (LLMs). The chat-based applications of generative pretrained transformer (GPT) models seem ideal for the use as honeypot backends, especially in request-response protocols like Secure Shell (SSH). By leveraging LLMs, many challenges associated with traditional honeypots - such as high development costs, ease of exposure, and breakout risks - appear to be solved. While early studies have primarily focused on the potential of these models, our research investigates the current limitations of GPT-3.5 by analyzing three datasets of varying complexity. We conducted an expert annotation of over 1,400 request-response pairs, encompassing 230 different base commands. Our findings reveal that while GPT-3.5 struggles to maintain context, incorporating session context into response generation improves the quality of SSH responses. Additionally, we explored whether distinguishing between convincing and non-convincing responses is a metrics issue. We propose a paraphrase-mining approach to address this challenge, which achieved a macro F1 score of 77.85% using cosine distance in our evaluation. This method has the potential to reduce annotation efforts, converge LLM-based honeypot performance evaluation, and facilitate comparisons between new and previous approaches in future research.
引用
收藏
页码:144579 / 144587
页数:9
相关论文
共 24 条
  • [1] Abbas-Escribano M., 2023, P 18 INT C AV REL SE, V201, P1
  • [2] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165, DOI 10.48550/ARXIV.2005.14165]
  • [3] Biswas S., 2023, Mesopotamian J. Comput. Sci., P9
  • [4] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
  • [5] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] Hu Yuqi, 2024, 2024 9th International Conference on Big Data Analytics (ICBDA), P227, DOI 10.1109/ICBDA61153.2024.10607309
  • [8] Jiang HQ, 2024, Arxiv, DOI [arXiv:2310.06839, 10.48550/arXiv.2310.06839]
  • [9] Analysing Attackers and Intrusions on a High-Interaction Honeypot System
    Knoechel, Mandy
    Wefel, Sandro
    [J]. 2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA, 2022, : 433 - 438
  • [10] Liu J., 2024, P ADV NEUR INF PROC, V36