In this paper, aiming at the problem of UAV formation control, a method based on policy iteration is proposed to study the optimal control policy of UAV formation. When the system model is unknown, this algorithm transforms the problem into online solving the Algebraic Riccati Equation. Through online iteration, the value function and the control policy can be updated at the same time. Finally, the optimal control policy is obtained and the nonlinear system is converged. Experimental results show that compared with the traditional control policy, the controller improves the stability of the UAV formation. The convergence speed and robustness of the system are also significantly enhanced, and the control performance is more optimized. At last, the simulation results verify the effectiveness of the proposed method.