Drug-BERT : Pre-trained Language Model Specialized for Korean Drug Crime

被引：0

作者：

Lee, Jeong Min ^{[1
,2
]}

Lee, Suyeon ^{[3
]}

Byon, Sungwon ^{[1
]}

Jung, Eui-Suk ^{[1
]}

Baek, Myung-Sun ^{[1
,2
]}

机构：

[1] Elect & Telecommun Res Inst, Daejeon, South Korea

[2] Univ Sci & Technol, Daejeon, South Korea

[3] Yonsei Univ, Dept Artificial Intelligence, Seoul, South Korea

来源：

19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024 | 2024年

关键词：

drug slang; natural language processing; pre-trained language model; classification;

D O I：

10.1109/BMSB62888.2024.10608314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose Drug-BERT, a specialized pre-trained language model designed for detecting drug-related content in the Korean language. Given the severity of the current drug issue in South Korea, effective responses are imperative. Focusing on the distinctive features of drug slang, this study seeks to improve the identification and classification of drug-related posts on social media platforms. Recent drug slangs are gathered and used to collect drug-related posts, and the collected data is used to train the language model. The designed pre-trained model is DRUG-BERT. The results show that fine-tuned DRUG-BERT outperforms that of the comparative models, achieving a 99.43% accuracy in classifying drug-relevant posts. Drug-BERT presents a promising solution for combatting drug-related activities, contributing to proactive measures against drug crimes in the Korean context.

引用

页码：186 / 188

页数：3

共 50 条

[31] Schema matching based on energy domain pre-trained language model [J].

Pan Z. ;

Yang M. ;

Monti A. .

Energy Informatics, 2023, 6 (Suppl 1)

[32] Lawformer: A pre-trained language model for Chinese legal long documents [J].

Xiao, Chaojun ;

Hu, Xueyu ;

Liu, Zhiyuan ;

Tu, Cunchao ;

Sun, Maosong .

AI OPEN, 2021, 2 :79-84

[33] Leveraging Pre-Trained Language Model for Summary Generation on Short Text [J].

Zhao, Shuai ;

You, Fucheng ;

Liu, Zeng Yuan .

IEEE ACCESS, 2020, 8 :228798-228803

[34] Biomedical-domain pre-trained language model for extractive summarization [J].

Du, Yongping ;

Li, Qingxiao ;

Wang, Lulin ;

He, Yanqing .

KNOWLEDGE-BASED SYSTEMS, 2020, 199

[35] EMBERT: A Pre-trained Language Model for Chinese Medical Text Mining [J].

Cai, Zerui ;

Zhang, Taolin ;

Wang, Chengyu ;

He, Xiaofeng .

WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 :242-257

[36] Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model [J].

Weng, Chia-Hsien ;

Lin, Kuan-Cheng ;

Ying, Jia-Ching .

APPLIED SCIENCES-BASEL, 2022, 12 (07)

[37] LaoPLM: Pre-trained Language Models for Lao [J].

Lin, Nankai ;

Fu, Yingwen ;

Yang, Ziyu ;

Chen, Chuwei ;

Jiang, Shengyi .

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, :6506-6512

[38] Classifying Code Comments via Pre-trained Programming Language Model [J].

Li, Ying ;

Wang, Haibo ;

Zhang, Huaien ;

Tan, Shin Hwei .

2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, :24-27

[39] BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [J].

Souza, Frederico Dias ;

de Oliveira e Souza Filho, Joao Baptista .

COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 :209-218

[40] AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease [J].

Mao, Chengsheng ;

Xu, Jie ;

Rasmussen, Luke ;

Li, Yikuan ;

Adekkanattu, Prakash ;

Pacheco, Jennifer ;

Bonakdarpour, Borna ;

Vassar, Robert ;

Shen, Li ;

Jiang, Guoqian ;

Wang, Fei ;

Pathak, Jyotishman ;

Luo, Yuan .

JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144

← 1 2 3 4 5 →