False data injection attack (FDIA) can affect the state estimation of the power grid by tampering with the measured value of the power grid data, and then destroying the stable operation of the smart grid. Existing work usually trains a detection model by fusing the data-driven features from diverse power data streams. Data-driven features, however, cannot effectively capture the differences between noisy data and attack samples. As a result, slight noise disturbances in the power grid may cause a large number of false detections for FDIA attacks. To address this problem, this paper designs a deep collaborative self-attention network to achieve robust FDIA detection, in which the spatio-temporal features of cascaded FDIA attacks are fully integrated. Firstly, a high-order Chebyshev polynomials-based graph convolution module is designed to effectively aggregate the spatio information between grid nodes, and the spatial self-attention mechanism is involved to dynamically assign attention weights to each node, which guides the network to pay more attention to the node information that is conducive to FDIA detection. Furthermore, the bi-directional Long Short-Term Memory (LSTM) network is introduced to conduct time series modeling and long-term dependence analysis for power grid data and utilizes the temporal selfattention mechanism to describe the time correlation of data and assign different weights to different time steps. Our designed deep collaborative network can effectively mine subtle perturbations from spatiotemporal feature information, efficiently distinguish power grid noise from FDIA attacks, and adapt to diverse attack intensities. Extensive experiments demonstrate that our method can obtain an efficient detection performance over actual and outperforms state-of-the-art FDIA detection schemes in terms of detection accuracy and robustness.