Generalized zero-shot learning (GZSL) aims to efficiently transfer knowledge from seen to unseen classes by learning semantic attributes and visual features. However, previous works mainly suffer from two limitations. 1) Due to the lack of unseen class samples in the training process, the embedding method is faced with serious domain shift problem, which leads to the prediction bias toward the seen classes; 2) The generation method usually lacks effective constraints in the process of sample generation, and does not consider the spatial structure consistency of semantic attributes and visual features, causing the generated features lack discriminative information. In order to overcome these limitations, a generative GZSL method: Contrastive Embedding and Structural Alignment Model (CESAM) is proposed in this paper. Specifically, we firstly present the contrastive embedding module, design the contrastive loss in embedding space to perform visual-level supervision and semantic-level supervision for GZSL and promote the model to construct more accurate and discriminative embedding space. Secondly, we present the structural alignment module to bridge the correlation information lost in contrastive learning, and enable the proposed method to maintain spatial structure consistency between semantic attributes and visual features, and further optimize the learning of feature generator with reconstruction loss. Furthermore, we design the integrated GZSL module to integrate the embedding module with the constrained generative module, construct the collaborative learning between the embedding module and the constrained generative module to learn a more discriminative and generalizable model. At last, extensive experimental evaluations on four datasets demonstrate that CESAM performs state-of-the-art performance.