CROSS-LINGUAL CYBERSECURITY ANALYTICS IN THE INTERNATIONAL DARK WEB WITH ADVERSARIAL DEEP REPRESENTATION LEARNING

被引:27
作者
Ebrahimi, Mohammadreza [1 ]
Chai, Yidong [2 ]
Samtani, Sagar [3 ]
Chen, Hsinchun [4 ]
机构
[1] Univ S Florida, Sch Informat Syst & Management, Tampa, FL 33620 USA
[2] Hefei Univ Technol, Sch Management, Anhua 230009, Peoples R China
[3] Indiana Univ, Dept Operat & Decis Technol, Bloomington, IN 47405 USA
[4] Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
基金
美国国家科学基金会;
关键词
Cybersecurity analytics; dark web; automated hacker asset detection; cross-lingual knowledge transfer; adversarial learning; computational design science;
D O I
10.25300/MISQ/2022/16618
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
International dark web platforms operating within multiple geopolitical regions and languages host a myriad of hacker assets such as malware, hacking tools, hacking tutorials, and malicious source code. Cybersecurity analytics organizations employ machine learning models trained on human-labeled data to automatically detect these assets and bolster their situational awareness. However, the lack of human-labeled training data is prohibitive when analyzing foreign-language dark web content. In this research note, we adopt the computational design science paradigm to develop a novel IT artifact for cross-lingual hacker asset detection (CLHAD). CLHAD automatically leverages the knowledge learned from English content to detect hacker assets in non-English dark web platforms. CLHAD encompasses a novel Adversarial deep representation learning (ADREL) method, which generates multilingual text representations using generative adversarial networks (GANs). Drawing upon the state of the art in cross-lingual knowledge transfer, ADREL is a novel approach to automatically extract transferable text representations and facilitate the analysis of multilingual content. We evaluate CLHAD on Russian, French, and Italian dark web platforms and demonstrate its practical utility in hacker asset profiling, and conduct a proof-of-concept case study. Our analysis suggests that cybersecurity managers may benefit more from focusing on Russian to identify sophisticated hacking assets. In contrast, financial hacker assets are scattered among several dominant dark web languages. Managerial insights for security managers are discussed at operational and strategic levels.
引用
收藏
页码:1209 / 1226
页数:18
相关论文
共 58 条
[1]  
Abdalla M., 2017, CROSS LINGUAL SENTIM, P506
[2]  
[Anonymous], 2017, SYNTHESIS LECT HUMAN, DOI 10.1007/978-3-031-02165-7
[3]  
[Anonymous], 2018, THESIS U CALIFORNIA
[4]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]   DICE-E: A FRAMEWORK FOR CONDUCTING DARKNET IDENTIFICATION, COLLECTION, EVALUATION WITH ETHICS [J].
Benjamin, Victor ;
Valacich, Joseph S. ;
Chen, Hsinchun .
MIS QUARTERLY, 2019, 43 (01) :1-22
[7]  
Benjamin V, 2016, IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: CYBERSECURITY AND BIG DATA, P205, DOI 10.1109/ISI.2016.7745471
[8]   Examining Hacker Participation Length in Cybercriminal Internet-Relay-Chat Communities [J].
Benjamin, Victor ;
Zhang, Bin ;
Nunamaker, Jay F., Jr. ;
Chen, Hsinchun .
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2016, 33 (02) :482-510
[9]  
Benjamin V, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), P79, DOI 10.1109/ISI.2015.7165943
[10]  
Cao Q, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3042