Micro-expression, revealing the true emotions and motives, attracts extraordinary attention on automatic facial micro-expression recognition (MER). The main challenge of MER is large-scale datasets unavailable to support deep learning training. To this end, this paper proposes an end-to-end transfer model for facial MER based on the difference images. Compared with micro-expression dataset, macro-expression dataset has more samples and is easy to train for deep neural network. Thus, we pre-train the resnet-18 network on relatively large expression datasets to get the good initial backbone module. Then, the difference images based on adaptive key frame is applied to get MER related feature representation for the module input. Finally, the preprocessing difference images are feed into the pre-trained resent-18 network for fine-tuning. Consequently, the proposed method achieves the recognition rates of 74.39% and 76.22% on the CASME2 and SMIC databases, respectively. The experimental results show that the difference image between the onset and key frame can improve the transfer training performance on resnet-18, the proposed MER method outperforms the methods based on traditional hand-crafted features and deep neural networks.