Drug-BERT : Pre-trained Language Model Specialized for Korean Drug Crime

被引:0
作者
Lee, Jeong Min [1 ,2 ]
Lee, Suyeon [3 ]
Byon, Sungwon [1 ]
Jung, Eui-Suk [1 ]
Baek, Myung-Sun [1 ,2 ]
机构
[1] Elect & Telecommun Res Inst, Daejeon, South Korea
[2] Univ Sci & Technol, Daejeon, South Korea
[3] Yonsei Univ, Dept Artificial Intelligence, Seoul, South Korea
来源
19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024 | 2024年
关键词
drug slang; natural language processing; pre-trained language model; classification;
D O I
10.1109/BMSB62888.2024.10608314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Drug-BERT, a specialized pre-trained language model designed for detecting drug-related content in the Korean language. Given the severity of the current drug issue in South Korea, effective responses are imperative. Focusing on the distinctive features of drug slang, this study seeks to improve the identification and classification of drug-related posts on social media platforms. Recent drug slangs are gathered and used to collect drug-related posts, and the collected data is used to train the language model. The designed pre-trained model is DRUG-BERT. The results show that fine-tuned DRUG-BERT outperforms that of the comparative models, achieving a 99.43% accuracy in classifying drug-relevant posts. Drug-BERT presents a promising solution for combatting drug-related activities, contributing to proactive measures against drug crimes in the Korean context.
引用
收藏
页码:186 / 188
页数:3
相关论文
共 50 条
  • [21] Software Vulnerabilities Detection Based on a Pre-trained Language Model
    Xu, Wenlin
    Li, Tong
    Wang, Jinsong
    Duan, Haibo
    Tang, Yahui
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 904 - 911
  • [22] SsciBERT: a pre-trained language model for social science texts
    Shen, Si
    Liu, Jiangfeng
    Lin, Litao
    Huang, Ying
    Zhang, Lin
    Liu, Chang
    Feng, Yutong
    Wang, Dongbo
    SCIENTOMETRICS, 2023, 128 (02) : 1241 - 1263
  • [23] Interpretability of Entity Matching Based on Pre-trained Language Model
    Liang Z.
    Wang H.-Z.
    Dai J.-J.
    Shao X.-Y.
    Ding X.-O.
    Mu T.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108
  • [24] Idiom Cloze Algorithm Integrating with Pre-trained Language Model
    Ju S.-G.
    Huang F.-Y.
    Sun J.-P.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (10): : 3793 - 3805
  • [25] Learning to Remove: Towards Isotropic Pre-trained BERT Embedding
    Liang, Yuxin
    Cao, Rui
    Zheng, Jie
    Ren, Jie
    Gao, Ling
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 448 - 459
  • [26] BERT-NAR-BERT: A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints
    Sohrab, Mohammad Golam
    Asada, Masaki
    Rikters, Matiss
    Miwa, Makoto
    IEEE ACCESS, 2024, 12 : 23 - 33
  • [27] ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding
    Wang, Chengyu
    Dai, Suyang
    Wang, Yipeng
    Yang, Fei
    Qiu, Minghui
    Chen, Kehan
    Zhou, Wei
    Huang, Jun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1207 - 1218
  • [28] Gaze-infused BERT: Do human gaze signals help pre-trained language models?
    Wang B.
    Liang B.
    Zhou L.
    Xu R.
    Neural Computing and Applications, 2024, 36 (20) : 12461 - 12482
  • [29] Schema matching based on energy domain pre-trained language model
    Pan Z.
    Yang M.
    Monti A.
    Energy Informatics, 2023, 6 (Suppl 1)
  • [30] Lawformer: A pre-trained language model for Chinese legal long documents
    Xiao, Chaojun
    Hu, Xueyu
    Liu, Zhiyuan
    Tu, Cunchao
    Sun, Maosong
    AI OPEN, 2021, 2 : 79 - 84