EchoPrint: Two-factor Authentication using Acoustics and Vision on Smartphones

被引：93

作者：

Zhou, Bing ^{[1
]}

Lohokare, Jay ^{[2
]}

Gao, Ruipeng ^{[3
]}

Ye, Fan ^{[1
]}

机构：

[1] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA

[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA

[3] Beijing Jiaotong Univ, Sch Software Engn, Beijing, Peoples R China

来源：

MOBICOM'18: PROCEEDINGS OF THE 24TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING | 2018年

关键词：

mobile sensing; acoustics; authentication; FACES;

D O I：

10.1145/3241539.3241575

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

User authentication on smartphones must satisfy both security and convenience, an inherently difficult balancing art. Apple's FaceID is arguably the latest of such efforts, at the cost of additional hardware (e.g., dot projector, flood illuminator and infrared camera). We propose a novel user authentication system EchoPrint, which leverages acoustics and vision for secure and convenient user authentication, without requiring any special hardware. EchoPrint actively emits almost inaudible acoustic signals from the earpiece speaker to "illuminate" the user's face and authenticates the user by the unique features extracted from the echoes bouncing off the 3D facial contour. To combat changes in phone holding poses thus echoes, a Convolutional Neural Network (CNN) is trained to extract reliable acoustic features, which are further combined with visual facial landmark locations to feed a binary Support Vector Machine (SVM) classifier for final authentication. Because the echo features depend on 3D facial geometries, EchoPrint is not easily spoofed by images or videos like 2D visual face recognition systems. It needs only commodity hardware, thus avoiding the extra costs of special sensors in solutions like FaceID. Experiments with 62 volunteers and non-human objects such as images, photos, and sculptures show that EchoPrint achieves 93.75% balanced accuracy and 93.50% F-score, while the average precision is 98.05%, and no image/video based attack is observed to succeed in spoofing.

引用

页码：321 / 336

页数：16

共 47 条

[1] Abadi M., 2016, TENSORFLOW LARGESCAL
[2] ULTRASONIC ECHOES FROM COMPLEX-SURFACES - AN APPLICATION TO OBJECT RECOGNITION
ABREU, JMM
BASTOS, TF
CALDERON, L
[J]. SENSORS AND ACTUATORS A-PHYSICAL, 1992, 31 (1-3) : 182 - 187
[3] Amos Brandon, 2016, OPENFACE GENE PURPOS
[4] [Anonymous], 2015, ACM MobiSys, DOI DOI 10.1145/2867070.2867078
[5] [Anonymous], 2005, P 6 INT C MUSIC INFO
[6] [Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]
[7] Baltrusaitis T, 2016, IEEE WINT CONF APPL
[8] Bing Zhou, 2017, P 15 ACM C EMB NETW
[9] ST-DBSCAN: An algorithm for clustering spatial-temp oral data
Birant, Derya
Kut, Alp
[J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) : 208 - 221
[10] LIBSVM: A Library for Support Vector Machines
Chang, Chih-Chung
Lin, Chih-Jen
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

← 1 2 3 4 5 →