An SoC-based CNN accelerator for face recognition using HWCK data scheduling

被引:0
作者
Tsai, Tsung-Han [1 ]
Hsu, Chin-Wei [1 ]
机构
[1] Natl Cent Univ, Dept Elect Engn, Taoyuan, Taiwan
关键词
System-on-chip; Deep learning; Face recognition; Deep neural network accelerator; Field-programmable gate array (FPGA);
D O I
10.1007/s00530-025-01838-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a face recognition system with a deep learning technique. The design uses the separable convolution accelerator with HWCK data scheduling. This scheduling method organizes the weight data according to the number of PEs (Processing Elements), considering hardware resources such as bandwidth and memory size. It is used to accelerate the deep separable convolution model through depthwise convolution, pointwise convolution, and batch normalization. We implement the system on a Xilinx ZCU106 development board, using an SoC architecture with ARM and FPGA to achieve a system-level access control design. The proposed accelerator achieves 222 FPS and 60.8 GOPS on the FaceNet-based network. The power consumption on the Xilinx ZCU106 board is 8.82 W with 6.89 GOPS/W performance. Additionally, our design can retain 94% accuracy on the VGGFACE2 dataset, and 99.2% on the LFW dataset. Compared to previous works, our design demonstrates superior real-time performance and energy efficiency.
引用
收藏
页数:16
相关论文
共 40 条
[21]   FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks [J].
Lu, Wenyan ;
Yan, Guihai ;
Li, Jiajun ;
Gong, Shijun ;
Han, Yinhe ;
Li, Xiaowei .
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :553-564
[22]  
Iandola FN, 2016, Arxiv, DOI arXiv:1602.07360
[23]  
Pang Wei, 8-bit Convolutional Neural Network Accelerator for Face Recognition
[24]  
PyTorch, About us
[25]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[26]   A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification [J].
Sang, Xiaoting ;
Ruan, Tao ;
Li, Chunlei ;
Li, Huanyu ;
Yang, Ruimin ;
Liu, Zhoufeng .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (01)
[27]  
Schroff F, 2015, PROC CVPR IEEE, P815, DOI 10.1109/CVPR.2015.7298682
[28]  
Su J, 2018, LECT NOTES COMPUT SC, V10824, P16, DOI 10.1007/978-3-319-78890-6_2
[29]   DeepFace: Closing the Gap to Human-Level Performance in Face Verification [J].
Taigman, Yaniv ;
Yang, Ming ;
Ranzato, Marc'Aurelio ;
Wolf, Lior .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1701-1708
[30]  
Teng Wang, 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications