Federated learning (FL) is an efficient and privacy-preserving distributed learning paradigm that enables massive edge devices to train machine learning models collaboratively. Although various communication schemes have been proposed to expedite the FL process in resource-limited wireless networks, the unreliable nature of wireless channels was less explored. In this work, we propose a novel FL framework, namely FL with gradient recycling (FL-GR), which recycles the historical gradients of unscheduled and transmission-failure devices to improve the learning performance of FL. To reduce the hardware requirements for implementing FL-GR in the practical network, we develop a memory-friendly FL-GR that is equivalent to FL-GR but requires low memory of the edge server. We then theoretically analyze how the wireless network parameters affect the convergence bound of FL-GR, revealing that minimizing the average square of local gradients' staleness (AS-GS) helps improve the learning performance. Based on this, we formulate a joint device scheduling, resource allocation and power control optimization problem to minimize the AS-GS for global loss minimization. To solve the problem, we first derive the optimal power control policy for devices and transform the AS-GS minimization problem into a bipartite graph matching problem. Through detailed analysis, we further transform the bipartite matching problem into an equivalent linear program which is convenient to solve. Extensive simulation results on three real-world datasets (i.e., MNIST, CIFAR-10, and CIFAR-100) verified the efficacy of the proposed methods. Compared to the FL algorithms without gradient recycling, FL-GR is able to achieve higher accuracy and fast convergence speed. In addition, the proposed device scheduling and resource allocation algorithm also outperforms the benchmarks in accuracy and convergence speed.