In various academic and applied domains including software engineering, lightweight software applications can be facilitated by knowledge distillation, which involves the transfer of insights from a teacher model to a student model, enabling the latter to gradually replicate the behavior of the former, thus achieving model compression and acceleration. Nonetheless, due to differences in the capacity between teacher and student models, the mixed types of knowledge in hidden representations, and variations in the proportion of knowledge in the hidden representations of each layer, there is still room for improvement in knowledge distillation. This paper proposes a Cross-layer Integration of Word-level Association (CI-WA) knowledge distillation. Firstly, CI-WA introduces an extractor constructed by dynamic sparse attention, which extracts task- related word-level associations from the hidden vectors at each layer, mitigating the influence of task-irrelevant information. Secondly, this paper introduces cross-layer connections into the knowledge distillation process, enhancing the performance of student model by jointly leveraging high-level and low-level features. Finally, the proposed method is validated on two tasks: natural language understanding and modeling. Experimental results illustrate that the proposed method surpasses state-of-the-art techniques, and language modeling provides a new perspective for knowledge distillation.