Drug-BERT : Pre-trained Language Model Specialized for Korean Drug Crime

被引:0
作者
Lee, Jeong Min [1 ,2 ]
Lee, Suyeon [3 ]
Byon, Sungwon [1 ]
Jung, Eui-Suk [1 ]
Baek, Myung-Sun [1 ,2 ]
机构
[1] Elect & Telecommun Res Inst, Daejeon, South Korea
[2] Univ Sci & Technol, Daejeon, South Korea
[3] Yonsei Univ, Dept Artificial Intelligence, Seoul, South Korea
来源
19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024 | 2024年
关键词
drug slang; natural language processing; pre-trained language model; classification;
D O I
10.1109/BMSB62888.2024.10608314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Drug-BERT, a specialized pre-trained language model designed for detecting drug-related content in the Korean language. Given the severity of the current drug issue in South Korea, effective responses are imperative. Focusing on the distinctive features of drug slang, this study seeks to improve the identification and classification of drug-related posts on social media platforms. Recent drug slangs are gathered and used to collect drug-related posts, and the collected data is used to train the language model. The designed pre-trained model is DRUG-BERT. The results show that fine-tuned DRUG-BERT outperforms that of the comparative models, achieving a 99.43% accuracy in classifying drug-relevant posts. Drug-BERT presents a promising solution for combatting drug-related activities, contributing to proactive measures against drug crimes in the Korean context.
引用
收藏
页码:186 / 188
页数:3
相关论文
共 50 条
[21]   Interpretability of Entity Matching Based on Pre-trained Language Model [J].
Liang Z. ;
Wang H.-Z. ;
Dai J.-J. ;
Shao X.-Y. ;
Ding X.-O. ;
Mu T.-Y. .
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03) :1087-1108
[22]   Pre-trained Language Model based Ranking in Baidu Search [J].
Zou, Lixin ;
Zhang, Shengqiang ;
Cai, Hengyi ;
Ma, Dehong ;
Cheng, Suqi ;
Wang, Shuaiqiang ;
Shi, Daiting ;
Cheng, Zhicong ;
Yin, Dawei .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :4014-4022
[23]   Leveraging Pre-trained Language Model for Speech Sentiment Analysis [J].
Shon, Suwon ;
Brusco, Pablo ;
Pan, Jing ;
Han, Kyu J. ;
Watanabe, Shinji .
INTERSPEECH 2021, 2021, :3420-3424
[24]   Software Vulnerabilities Detection Based on a Pre-trained Language Model [J].
Xu, Wenlin ;
Li, Tong ;
Wang, Jinsong ;
Duan, Haibo ;
Tang, Yahui .
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, :904-911
[25]   Idiom Cloze Algorithm Integrating with Pre-trained Language Model [J].
Ju S.-G. ;
Huang F.-Y. ;
Sun J.-P. .
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (10) :3793-3805
[26]   BERT-NAR-BERT: A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints [J].
Sohrab, Mohammad Golam ;
Asada, Masaki ;
Rikters, Matiss ;
Miwa, Makoto .
IEEE ACCESS, 2024, 12 :23-33
[27]   Learning to Remove: Towards Isotropic Pre-trained BERT Embedding [J].
Liang, Yuxin ;
Cao, Rui ;
Zheng, Jie ;
Ren, Jie ;
Gao, Ling .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 :448-459
[28]   ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding [J].
Wang, Chengyu ;
Dai, Suyang ;
Wang, Yipeng ;
Yang, Fei ;
Qiu, Minghui ;
Chen, Kehan ;
Zhou, Wei ;
Huang, Jun .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 :1207-1218
[29]   ForestryBERT: A pre-trained language model with continual learning adapted to changing forestry text [J].
Tan, Jingwei ;
Zhang, Huaiqing ;
Yang, Jie ;
Liu, Yang ;
Zheng, Dongping ;
Liu, Xiqin .
KNOWLEDGE-BASED SYSTEMS, 2025, 320
[30]   Gaze-infused BERT: Do human gaze signals help pre-trained language models? [J].
Wang B. ;
Liang B. ;
Zhou L. ;
Xu R. .
Neural Computing and Applications, 2024, 36 (20) :12461-12482