Don't Stop Believin': A Unified Evaluation Approach for LLM Honeypots

被引：0

作者：

Weber, Simon B. ^{[1
]}

Feger, Marc ^{[1
]}

Pilgermann, Michael ^{[2
]}

机构：

[1] Heinrich Heine Univ Dusseldorf, Dept Comp Sci, D-40225 Dusseldorf, Germany

[2] Brandenburg Univ Appl Sci, Dept Comp Sci & Media, D-14770 Brandenburg, Germany

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Annotations; Protocols; Measurement; Complexity theory; Large language models; History; Data models; Accuracy; Usability; Information security; Risk management; Performance evaluation; IT security; honeypot; large language model; GPT; cosine distance; evaluation;

D O I：

10.1109/ACCESS.2024.3472460

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The research area of honeypots is gaining new momentum, driven by advancements in large language models (LLMs). The chat-based applications of generative pretrained transformer (GPT) models seem ideal for the use as honeypot backends, especially in request-response protocols like Secure Shell (SSH). By leveraging LLMs, many challenges associated with traditional honeypots - such as high development costs, ease of exposure, and breakout risks - appear to be solved. While early studies have primarily focused on the potential of these models, our research investigates the current limitations of GPT-3.5 by analyzing three datasets of varying complexity. We conducted an expert annotation of over 1,400 request-response pairs, encompassing 230 different base commands. Our findings reveal that while GPT-3.5 struggles to maintain context, incorporating session context into response generation improves the quality of SSH responses. Additionally, we explored whether distinguishing between convincing and non-convincing responses is a metrics issue. We propose a paraphrase-mining approach to address this challenge, which achieved a macro F1 score of 77.85% using cosine distance in our evaluation. This method has the potential to reduce annotation efforts, converge LLM-based honeypot performance evaluation, and facilitate comparisons between new and previous approaches in future research.

引用

页码：144579 / 144587

页数：9

共 24 条

[1] Abbas-Escribano M., 2023, P 18 INT C AV REL SE, V201, P1
[2] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165, DOI 10.48550/ARXIV.2005.14165]
[3] Biswas S., 2023, Mesopotamian J. Comput. Sci., P9
[4] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[5] Learning a similarity metric discriminatively, with application to face verification
Chopra, S
Hadsell, R
LeCun, Y
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
[6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7] Hu Yuqi, 2024, 2024 9th International Conference on Big Data Analytics (ICBDA), P227, DOI 10.1109/ICBDA61153.2024.10607309
[8] Jiang HQ, 2024, Arxiv, DOI [arXiv:2310.06839, 10.48550/arXiv.2310.06839]
[9] Analysing Attackers and Intrusions on a High-Interaction Honeypot System
Knoechel, Mandy
Wefel, Sandro
[J]. 2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA, 2022, : 433 - 438
[10] Liu J., 2024, P ADV NEUR INF PROC, V36

← 1 2 3 →