Channel state information (CSI) is usually necessary for downlink precoding, power allocation, etc. in multiple-input multiple-output (MIMO) systems. When the base stations (BS) are equipped with massive elements, the training overhead required by conventional CSI estimation methods becomes overwhelming, leading to unacceptable loss of spectrum efficiency. In this paper, we investigate pilot design and CSI acquisition issues for downlink massive MIMO transmission. By exploiting the sparsity of beam domain channel, we first derive the optimal pilot structure in case of non-orthogonal pilots with compressive sensing (CS) framework. A deterministic sensing matrix design method is then proposed that satisfies the restricted isometry property (RIP). As beam domain channels are usually approximately sparse, we propose a modified subspace pursuit (SP) algorithm to recover the signals with tradeoff between noise and approximation error. Numerical results demonstrate that the proposed sensing matrices have better performance than conventional random CS matrices, and the new channel estimation scheme achieves significant performance improvement with reduced pilots consumption over conventional least square (LS) method.