Face recognition algorithm has been widely used in many scenarios due to the great improvement of accuracy by using Convolutional Neuro Networks. People have achieved nearly 99.7% accuracy on face authentication task in certain dataset. However, face recognition product still faces technological and cost problem due to the conflict of probable large amount of identities and the insufficient computing power on embedded devices. To have better performance on a lager dataset, we usually train a bigger network, resulting hard implementation on smart devices. Another facing problem is called one-shot learning, we usually don t have more than one image of each identity to form face recognition database, resulting unreliable result. The main contribution in this paper is: (A) Implement a face recognition system on embedded device with specific hardware accelerator. (B) Bring up an easy method recognizing and augmenting dataset at the same time. (C) Furthermore, we divide the large database into small pieces according to districts and bring up a cloud to server local recognition system. © 2020 Computer Society of the Republic of China. All rights reserved.