I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs

被引:78
作者
Sivakorn, Suphannee [1 ]
Polakis, Iasonas [1 ]
Keromytis, Angelos D. [1 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
来源
1ST IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY | 2016年
基金
美国国家科学基金会;
关键词
D O I
10.1109/EuroSP.2016.37
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold; to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications; as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.
引用
收藏
页码:388 / 403
页数:16
相关论文
共 45 条
[1]  
Baecher P, 2011, IFIP ADV INF COMM TE, V354, P56
[2]  
Bursztein E., SP 10
[3]  
Bursztein E., USENIX WOOT 14
[4]  
Bursztein E., CCS 11
[5]  
Bursztein E., CHI 14
[6]  
Chan R. H., 2005, T IMG PROC, V14
[7]  
Chew M., ISC 04
[8]  
Cruz-Perez C., 2012, BREAKING RECAPTCHAS, V7329
[9]  
Egele M., SAC 10
[10]   "Prove You're Human": Fetishizing Material Embodiment and Immaterial Labor in Information Networks [J].
Foley, Megan .
CRITICAL STUDIES IN MEDIA COMMUNICATION, 2014, 31 (05) :365-379