Regulating Cryptocurrencies: A Supervised Machine Learning Approach to De-Anonymizing the Bitcoin Blockchain

被引:147
作者
Yin, Hao Hua Sun [1 ,2 ,3 ]
Langenheldt, Klaus [4 ]
Harlev, Mikkel [4 ]
Mukkamala, Raghava Rao [4 ,5 ]
Vatrapu, Ravi [6 ,7 ,8 ]
机构
[1] Copenhagen Business Sch, Dept Digitalizat, Ctr Business Data Analyt, Frederiksberg, Denmark
[2] Cryptium Labs GmbH, Zug, Switzerland
[3] Cosmos, Copenhagen, Denmark
[4] CBS, Ctr Business Data Analyt, Dept Digitalizat, Frederiksberg, Denmark
[5] Kristiania Univ Coll, Dept Technol, Oslo, Norway
[6] Copenhagen Business Sch, Dept Digitalizat, Computat Social Sci, Frederiksberg, Denmark
[7] Kristiania Univ Coll, Appl Comp, Oslo, Norway
[8] CBS, Ctr Business Data Analyt, Frederiksberg, Denmark
关键词
cryptocurrencies; Bitcoin; blockchain; cybersecurity; supervised machine learning; online anonymity; cybercrime; SOCIAL MEDIA; FRAMEWORK; SYSTEMS; ISSUES; IDENTIFICATION;
D O I
10.1080/07421222.2018.1550550
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity's real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with approximate to 385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of approximate to 79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.
引用
收藏
页码:37 / 73
页数:37
相关论文
共 115 条
[1]   Enhancing Predictive Analytics for Anti-Phishing by Exploiting Website Genre Information [J].
Abbasi, Ahmed ;
Zahedi, Fatemeh Mariam ;
Zeng, Daniel ;
Chen, Yan ;
Chen, Hsinchun ;
Nunamaker, Jay F., Jr. .
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2015, 31 (04) :109-157
[2]  
Abbasi A, 2012, MIS QUART, V36, P1293
[3]  
Abbasi A, 2010, MIS QUART, V34, P435
[4]   Stylometric Identification in Electronic Markets: Scalability and Robustness [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Nunamaker, Jay F., Jr. .
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2008, 25 (01) :49-78
[5]  
Ajello N.J., 2014, BROOKLYN LAW REV, V80, P435
[6]  
Ali S.T., 2015, LECT NOTES COMPUTER, V9379
[7]  
Androulaki E., 2013, LECT NOTES COMPUTER, V7859
[8]  
[Anonymous], 2015, PROOF STAK VERS PROO
[9]  
[Anonymous], 1999, THESIS
[10]  
[Anonymous], 2013, P 2013 ECRIME RES SU, DOI DOI 10.1109/ECRS.2013.6805780