A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

被引:192
作者
George, Dileep [1 ]
Lehrach, Wolfgang [1 ]
Kansky, Ken [1 ]
Lazaro-Gredilla, Miguel [1 ]
Laan, Christopher [1 ]
Marthi, Bhaskara [1 ]
Lou, Xinghua [1 ]
Meng, Zhaoshi [1 ]
Liu, Yi [1 ]
Wang, Huayan [1 ]
Lavin, Alex [1 ]
Phoenix, D. Scott [1 ]
机构
[1] Vicarious AI, 2 Union Sq, Union, CA 94587 USA
关键词
PRIMARY VISUAL-CORTEX; OBJECT RECOGNITION; SEGMENTATION; ATTENTION;
D O I
10.1126/science.aag2612
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing-based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
引用
收藏
页数:9
相关论文
共 62 条
[1]  
[Anonymous], 2006, IEEE Conference on Computer Vision and Pattern Recognition
[2]  
[Anonymous], 2 INT C LEARN REPR I
[3]  
[Anonymous], 2015, INT C LEARN REPR ICL
[4]  
[Anonymous], 2014, INT C LEARN REPR ICL, Patent No. 13126114
[5]  
[Anonymous], 2004, ADV NEURAL INF PROCE
[6]  
[Anonymous], 1985, Metamagical Themas: Questing for the Essence of Mind and Pattern
[7]  
[Anonymous], 2014, 8 USENIX WORKSHOP OF
[8]  
Bienenstock E, 1997, ADV NEUR IN, V9, P838
[9]  
Bursztein E, 2011, PROCEEDINGS OF THE 18TH ACM CONFERENCE ON COMPUTER & COMMUNICATIONS SECURITY (CCS 11), P125
[10]  
Chen Y., 2007, ADV NEURAL INFORM PR, V20, P289